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Abstract 

The  context  iNterchange  (coin)  strategy  [44;  48]  presents  a  novel  perspective  for  medi- 
ated data  access  in  which  semantic  conflicts  among  heterogeneous  systems  are  not  identified 
a  priori,  but  are  detected  and  reconciled  by  a  Context  Mediator  through  comparison  of  con- 
texts associated  with  any  two  systems  engaged  in  data  exchajige.  In  this  paper,  we  present 
a  formal  characterization  and  reconstruction  of  this  strategy  in  a  COIN  framework,  based  on 
a  deductive  object-oriented  data  model  and  language  called  COIN.  The  COIN  framework  pro- 
vides a  logical  formalism  for  representing  data  semantics  in  distinct  contexts.  We  show  that 
this  presents  a  well-founded  basis  for  reasoning  about  semantic  disparities  in  heterogeneous 
systems.  In  addition,  it  combines  the  best  features  of  loose-  and  tight- coupling  approaches 
in  defining  an  integration  strategy  that  is  scalable,  extensible  and  accessible.  These  latter 
features  are  made  possible  by  allowing  complexity  of  the  system  to  be  harnessed  in  small 
chunks,  by  enabling  sources  and  receivers  to  remaiin  loosely-coupled  to  one  another,  and  by 
sustaining  an  infrastructure  for  data  integration. 

Keywords:   Context,  heterogeneous  databfises,  logic  and  databases,  mediators,  semantic 
interoperability. 
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1      Introduction 

The  last  few  years  have  witnessed  an  unprecedented  growth  in  the  number  of  information  sources 
(traditional  database  systems,  data  feeds,  web-sites,  and  applications  providing  structured  or 
semi-structured  data  on  requests)  and  receivers  (human  users,  data  warehouses,  and  applica- 
tions that  make  data  requests).  This  is  spurred  on  by  a  number  of  factors:  for  example,  ease 
of  access  to  the  Internet,  which  is  emerging  as  the  de  facto  global  information  infrastructure, 
and  the  rise  of  new  organizational  forms  (e.g.,  adhocracies  and  networked  organizations  [36]), 
which  mandated  new  ways  of  sharing  and  managing  information.  The  advances  in  networking 
and  telecommunications  technologies  have  led  to  increased  physical  connectivity  (the  ability 
to  exchange  bits  and  bytes),  but  not  necessarily  logical  connectivity  (the  ability  to  exchange 
information  meaningfully).  This  problem  is  traditionally  referred  to  as  the  need  for  semantic 
interoperability  [47]  among  autonomous  and  heterogeneous  systems. 

1.1     Context  Interchange:  Background 

The  goal  of  this  paper  is  to  describe  a  novel  approach,  called  Context  Interchange  (coin), 
for  achieving  semantic  interoperability  among  heterogeneous  sources  and  receivers.  The  COIN 
strategy  described  in  this  paper  drew  its  inspiration  from  earlier  work  reported  in  [48;  44]. 
Specifically,  we  share  the  basic  tenets  that 

•  the  detection  and  reconciliation  of  semantic  conflicts  axe  system  services  which  are  pro- 
vided by  a  Context  Mediator,  and  should  be  transparent  to  a  user;  and 

•  the  provision  of  such  a  mediation  service  requires  only  that  the  user  furnish  a  logical 
(declarative)  specification  of  how  data  are  interpreted  in  sources  and  receivers,  and  how 
conflicts,  when  detected,  should  be  resolved,  but  not  what  conflicts  exists  a  priori  between 
any  two  systems. 

These  insights  axe  novel  because  they  depaxt  from  classical  integration  strategies  which  ei- 
ther require  users  to  engage  in  the  detection  and  reconciliation  of  conflicts  (in  the  case  of 
loosely-coupled  systems;  e.g.,  MRDSM  [33],  VIP-MDBMS  [29]),  or  insist  that  conflicts  should 
be  identified  and  reconciled,  a  priori,  by  some  system  administrator,  in  one  or  more  shared 
schemas  (as  in  tightly- coupled  systems;  e.g.,  Multibase  [30],  Mermaid  [49]). 

Unfortunately,  as  interesting  as  these  ideas  may  be,  they  could  remain  as  vague  musings 
in  the  absence  of  a  formal  conceptual  foundation.  One  attempt  at  identifying  a  conceptual 
basis  for  Context  Interchange  is  the  semantic-value  model  [44],  where  each  data  element  is 
augmented  with  a  property-list  which  defines  its  context.    This  model,  however,  continues  to 


be  fraught  with  ambiguity.  For  example,  it  relied  on  implicit  agreement  on  what  the  modifiers 
for  different  attributes  are,  as  well  as  what  conversion  functions  axe  applicable  for  different 
kinds  of  conflicts,  and  is  silent  on  how  different  conversion  definitions  can  be  associated  with 
distinct  contexts.  Defining  the  semantics  of  data  through  annotations  attached  to  individual 
data  elements  tend  also  to  be  cumbersome,  and  there  is  no  systematic  way  of  promoting  the 
sharing  and  reusing  of  the  semantic  representations.  At  the  same  time,  the  representational 
formalism  remains  somewhat  detached  from  the  underlying  conflict  detection  algorithm  (the 
subsumption  algorithm  [48]).  Among  other  problems,  this  algorithm  requires  conflict  detection 
to  be  done  on  a  pairwise  basis  (i.e.,  by  comparing  the  context  definitions  for  two  systems  at  a 
time),  and  is  non-committal  on  how  a  query  plan  for  multiple  sources  can  be  constructed  based 
on  the  sets  of  pair-wise  conflicts.  Furthermore,  the  algorithm  limits  meta-attributes  to  only  a 
single-level  (i.e.,  property  lists  cannot  be  nested),  and  are  not  able  to  take  advantage  of  known 
constraints  for  pruning  off  conflicts  which  are  guaranteed  never  to  occur. 

1.2     Summary  of  Contributions 

We  have  two  (intertwined)  objectives  in  this  paper.  First,  we  aim  to  provide  a  formal  foundation 
for  the  Context  Interchange  strategy  that  will  not  only  rectify  the  problems  described  earlier, 
but  also  provide  for  an  integration  of  the  underlying  representational  and  reasoning  formalisms. 
Second,  the  deconstruction  and  subsequent  reconstruction^  of  the  Context  Interchange  approach 
described  in  [48;  44]  provides  us  with  the  opportunity  to  address  the  concern  for  integration 
strategies  that  are  scalable,  extensible  and  accessible. 

Our  formal  characterization  of  the  Context  Interchange  strategy  takes  the  form  of  a  COIN 
framework,  based  on  the  COIN  data  model,  which  is  a  customized  subset  of  the  deductive 
object-oriented  model  called  Gulog^  [15].  coin  is  a  "logical"  data  model  in  the  sense  that  it 
uses  logic  as  a  formalism  for  representing  knowledge  and  for  expressing  operations.  The  logical 
features  of  coin  provide  us  with  a  well-founded  basis  for  making  inferences  about  semantic 
disparities  that  exist  among  data  in  different  contexts.  In  particular,  a  coin  framework  can  be 
translated  to  a  normal  program  [34]  (equivalently,  a  Datalog^^^  program)  for  which  the  semantics 
is  well-defined,  and  where  sound  computational  procedures  for  query  answering  exist.  Since 
there  is  no  real  distinction  between  factual  statements  (i.e.,  data  in  sources)  and  knowledge 
(i.e.,  statements  encoding  data  semantics)  in  this  logical  framework,  both  queries  on  data 
sources  [data-level  queries)  as  well  as  queries  on  data  semantics  [knowledge-level  queries)  can 


In  this  dixonstmction,  we  tease  apcirt  different  elements  of  the  Context  Interchange  strategy  with  the  gocil 
of  understanding  their  contributions  individually  and  collectively.    The  reMnstntction  examines  how  the  same 
features  (and  more)  can  be  accomplished  differently  within  the  formalism  we  have  invented. 
^Gulog  is  itself  a  variant  of  F-logic  [28). 


be  processed  in  an  identical  manner.  As  an  alternative  to  the  classic  deductive  framework, 
we  investigate  the  adoption  of  an  abductive  framework  [27]  for  query  processing.  Interestingly, 
although  abduction  and  deduction  are  "mirror-images"  of  each  other  [14],  the  abductive  answers, 
computed  using  a  simple  extension  to  classic  SLD-resolution  leads  to  intensional  answers  as 
opposed  to  extensional  answers  that  would  be  obtained  via  deduction.  Intensional  answers  are 
useful  in  our  framework  for  a  number  of  conceptual  and  practical  reasons.  In  particular,  if  the 
query  is  issued  by  a  "naive"  user  under  the  assumption  that  there  are  no  conflicts  whatsoever, 
the  intensional  answer  obtained  can  be  interpreted  as  the  corresponding  mediated  query  in  which 
database  accesses  are  interleaved  with  data  transformations  required  for  mediating  potential 
conflicts.  Finally,  by  checking  the  consistency  of  the  abducted  answers  against  known  integrity 
constraints,  we  show  that  the  abducted  answer  can  be  greatly  simplified,  demonstrating  a  clear 
connection  to  what  is  traditionally  known  as  semantic  query  optimization  [7]. 

As  much  as  it  is  a  logical  data  model,  COIN  is  also  an  "object-oriented"  data  model  be- 
cause it  adopts  an  "object-centric"  view  of  the  world  and  supports  many  of  the  features  (e.g., 
object- identity,  type- hierarchy,  inheritance,  and  overriding)  commonly  associated  with  object- 
orientation.  The  standard  use  of  abstraction,  inheritance,  as  well  as  structural  and  behavioral 
inheritance  [28]  present  many  opportunities  for  sharing  and  reuse  of  semantic  encodings.  Con- 
version functions  (for  transforming  the  representation  of  data  between  contexts)  can  be  modeled 
as  methods  attached  to  types  in  a  natural  fashion.  Unlike  "general  purpose"  object-oriented 
formalisms,  we  make  some  adjustments  to  the  structure  of  our  model  by  distinguishing  between 
different  kinds  of  objects  which  have  particular  significance  for  our  problem  domain.  In  partic- 
ular, we  introduce  the  notion  of  context-objects,  described  in  [37],  as  reified  representations  for 
collections  of  statements  about  particular  contexts.  This  allows  context  knowledge  to  be  defined 
with  a  common  reference  point  and  is  instrumental  in  providing  a  structuring  mechanism  for 
making  inferences  across  multiple  theories  which  may  be  mutually  inconsistent. 

The  reconstruction  of  the  Context  Interchange  strategy  allows  us  to  go  beyond  the  clas- 
sical concern  of  "non-intrusion",  and  provides  a  formulation  that  is  scalable,  extensible  and 
accessible  [21].  By  scalability,  we  require  that  the  complexity  of  creating  and  administering 
(maintaining)  the  interoperation  services  should  not  increase  exponentially  with  the  number  of 
participating  sources  and  receivers.  Extensibility  refers  to  the  ability  to  incorporate  changes  in 
a  graceful  manner;  in  particular,  local  changes  should  not  have  adverse  effects  on  other  parts 
of  the  larger  system.  Finally,  accessibility  refers  to  how  the  system  is  perceived  by  a  user  in 
terms  of  its  ease-of-use  and  flexibility  in  supporting  different  kinds  of  queries. 

The  above  concerns  are  addressed  in  two  ways  in  the  reconstructed  Context  Interchange 


strategy^.  Provisions  for  msiking  sources  more  accessible  to  users  is  accomplished  by  shifting 
the  burden  for  conflict  detection  and  mediation  to  the  system;  supporting  multiple  paradigms 
for  data  access  by  supporting  queries  formulated  directly  on  sources  as  well  as  queries  medi- 
ated by  views;  by  making  knowledge  of  disparate  semantics  accessible  to  users  by  supporting 
knowledge- level  queries  and  answering  with  intensional  answers;  and  by  providing  feedback  in 
the  form  of  mediated  queries.  Scalability  and  extensibility  are  addressed  by  maintaining  the 
distinction  between  the  representation  of  data  semantics  as  is  known  in  individual  contexts, 
and  the  detection  of  potential  conflicts  that  may  arise  when  data  are  exchanged  between  two 
systems;  by  the  structural  arrangement  that  allow  data  semantics  to  be  specified  with  reference 
to  complex  object  types  in  the  domain  model  as  opposed  to  annotations  tightly-coupled  in  the 
individual  database  schemas;  by  allowing  multiple  systems  with  distinct  schemas  to  bind  to  the 
same  contexts;  by  the  judicious  use  of  object-oriented  features,  in  particular,  inheritance  and 
overriding  in  the  type  system  present  in  the  domain  model;  and  by  sustaining  an  infrastruc- 
ture for  data  integration  that  combines  these  features.  As  a  special  effort  in  providing  such  an 
infrastructure,  we  introduce  a  meta-logical  extension  of  the  COIN  framework  which  allows  sets 
of  context  axioms  to  be  "objectified"  and  placed  in  a  hierarchy,  such  that  new  and  more  com- 
plex contexts  can  be  derived  tlirough  a  hierarchical  composition  operator  [5].  This  mechanism, 
coupled  with  type  inheritance,  constitutes  a  powerful  approach  for  incorporating  changes  (e.g., 
the  addition  of  a  new  system,  or  changes  to  the  domain  model)  in  a  graceful  manner. 

Finally,  we  remark  that  the  feasibility  and  features  of  this  approach  have  been  demonstrated 
in  a  prototype  implementation  which  provides  mediated  access  to  traditional  database  systems 
(e.g.,  Oracle  databases)  as  well  as  semi-structured  data  (e.g..  Web-sites)^. 

1.3      Organization  of  this  Paper 

The  rest  of  this  paper  is  organized  as  follows.  Following  this  introduction,  we  present  a  motiva- 
tional example  which  is  used  to  highlight  selected  features  of  the  Context  Interchange  strategy. 
We  demonstrate  in  Section  3  the  novelty  of  our  approach  by  comparing  it  with  a  variety  of 
other  classical  and  contemporary  approaches.  Section  4  describes  the  structure  and  language 
of  the  COIN  data  model,  which  forms  the  basis  for  the  formal  characterization  of  the  Context 
Interchange  strategy  in  a  COIN  framework.  Section  5  introduces  the  abduction  fram,ework  and 
illustrates  how  abductive  inference  is  used  as  the  basis  for  obtaining  intensional  answers,  and 
in  particular,  mediated  answers  corresponding  to  naive  queries  (i.e.,  those  formulated  while 


For  the  sake  of  brevity,  further  references  made  to  the  Context  Interchange  strategy  from  this  point  on  refers 
to  this  recon.struction  unless  otherwise  specified. 

This  prototype  is  accessible  from  any  WWW-client  (e.g.,  Netscape  Browser)  and  can  be  demonstrated  upon 
request. 


assuming  that  sources  are  homogeneous).  Section  6  describes  a  meta-logical  extension  to  the 
COIN  framework  which  allows  changes  to  be  incorporated  in  a  graceful  manner.  Section  7  de- 
scribes the  prototype  system  which  provides  integrated  access  to  a  variety  of  heterogeneous 
sources  while  adopting  the  framework  which  we  have  described.  We  conclude  in  Section  8  with 
a  number  of  suggestions  for  further  work. 

2     Motivational  Example 

Consider  the  scenario  shown  in  Figure  1,  deliberately  kept  simple  for  didactical  reasons.  Data 
on  "revenue"  and  "expenses"  (respectively)  for  some  collection  of  companies  are  available  in 
two  autonomously-administered  data  sources,  each  comprised  of  a  single  relation^.  Suppose 
a  user  is  interested  in  knowing  which  companies  have  been  "profitable"  and  their  respective 
revenue:  this  query  can  be  formulated  directly  on  the  (export)  schemas  of  the  two  sources  as 
follows^: 

Ql:  SELECT  rl.cname,   rl .revenue  FROM  rl ,   r2 

WHERE  r  1 .  cname  =  r2.cname  AND  rl  .revenue  >  r2. expenses; 

In  the  absence  of  any  mediation,  this  query  will  return  the  empty  answer  if  it  is  executed  over 
the  extensional  data  set  shown  in  Figure  1. 

The  above  query,  however,  does  not  take  into  account  the  fact  that  data  sources  cire  admin- 
istered independently  and  have  different  contexts:  i.e.,  they  may  embody  different  assumptions 
on  how  information  contained  therein  should  be  interpreted.  To  simplify  the  ensuing  discus- 
sion, we  assume  that  the  data  reported  in  the  two  sources  differ  only  in  the  currencies  and 
scale-factors  of  "company  financials"  (i.e.,  financial  figures  pertaining  to  the  companies,  which 
include  revenue  and  expenses).  Specifically,  in  Source  1,  all  "company  financials"  are  reported 
using  the  currency  shown  and  a  scale- factor  of  1;  the  only  exception  is  when  they  are  reported 
in  Japanese  Yen  (JPY);  in  which  case  the  scale-factor  is  1000.  Source  2,  on  the  other  hand, 
reports  all  "company  financials"  in  USD  using  a  scale-factor  of  1.  In  the  light  of  these  remarks, 


'Throughout  this  paper,  we  make  the  assumption  that  the  relationcil  data  model  is  adopted  to  be  the  canonical 
data  model  [47]:  i.e.,  we  assume  that  the  database  schemas  exported  by  the  sources  are  relational  and  that  queries 
are  formulated  using  SQL  (or  some  extension  thereof).  This  simplifies  the  discussion  by  allowing  us  to  focus  on 
semantic  conflicts  in  disparate  systems  without  being  detracted  by  conflicts  over  data  model  constructs.  The 
choice  of  the  relational  data  model  is  one  of  convenience  rather  than  necessity,  and  is  not  to  be  construed  as  a 
constraint  of  the  integration  strategy  being  proposed. 

®We  assume,  without  loss  of  generality,  that  relation  ncimes  are  unique  across  all  data  sources.  This  can 
always  be  accomplished  via  some  renaming  scheme:  say,  by  prefixing  relation  name  with  the  name  of  the  data 
source  (e.g.,  <ibl#rl). 


Context  c. 


Alt  "company  ftnancials "  ("revenue "  inclusive)  are  reported 

in  the  currency  shown  for  each  company 
Alt  "company  ftnancials"  are  reported  using  a  scale-factor  of  1 
except  for  items  reported  in  JPY,  where  the 
scale-factor  is  1000 


rl 


cname       revenue 


IBM 
NTT 


1    000    000 
1    000    000 


currency 


USD 
JPY 


Source  I 


Context  c, 


2 


All  "company  ftnancials "  are  reported  in  USD, 
using  a  scale-factor  of  1 


Query 

select    rl. cname,    rl. revenue 
from  rl,    r2 

where   rl . cname    =    r2 . cname 
and  rl . revenue   >   r2. expenses 


User 


r2 


cname 

expenses 

IBM 
NTT 

1  500  000 
5  000  000 

Source  2 


r3 


fromCur 


USD 
JPY 


toCur 


JPY 
USD 


exchangeRate 


104.0 
.0096 


0 


Source  3 


Figure  1:  Scenario  for  the  motivational  example. 


the  (empty)  answer  returned  by  executing  Ql  is  clearly  not  a  "correct"  answer  since  the  rev- 
enue of  NTT  (9,600,000  USD  =  1,000,000  x  1,000  x  0.0096)  is  numerically  larger  than  the 
expenses  (5,000,000)  reported  in  r2. 

In  the  rest  of  this  section,  we  present  an  overview  of  the  "functional"  features  of  the  Context 
Interchange  strategy  in  providing  integrated  access  to  data  in  heterogeneous  environments  such 
as  illustrated  above.  Our  discussion  is  organized  in  two  parts:  the  first  examines  the  distinctive 
properties  of  our  approach  from  a  user  perspective;  the  second  focuses  on  features  that  are 
novel  from  a  system  perspective,  with  peirticular  attention  to  its  scalability  and  extensibility. 
Bear  in  mind  that  these  discussions  are  aimed  at  providing  the  underlying  intuition  without 


necessarily  being  precise  at  all  times;  formal  definitions  of  the  underlying  representational  and 
inference  formalisms  are  postponed  to  Sections  4,  5,  and  6  of  this  paper. 

2.1     Functional  Features  of  Context  Interchange:  A  User  Perspective 

Unlike  classical  and  contemporary  approaches,  the  Context  Interchange  approach  provides  users 
with  a  wide  array  of  options  on  how  and  what  queries  can  be  asked  and  the  kinds  of  answers 
which  can  be  returned.  These  features  work  in  tandem  to  allow  greater  flexibility  and  effective- 
ness in  gaining  access  to  information  present  in  multiple,  heterogeneous  systems. 

Query  Mediation:  Automatic  Detection  and  Reconciliation  of  Conflicts 

In  a  Context  Interchange  system,  the  same  query  (Ql)  can  be  submitted  to  a  specialized 
mediator  [54],  called  a  Context  Mediator,  which  rewrites  the  query  so  that  data  exchange 
among  sites  having  disparate  contexts  axe  interleaved  with  appropriate  data  transformations 
and  access  to  ancillary  data  sources  (when  needed).  We  refer  to  this  transformation  as  query 
mediation  and  the  resulting  query  as  the  corresponding  mediated  query. 
For  example,  the  mediated  query  MQl  corresponding  to  Ql  is  given  by: 

MQl:  SELECT  rl.cname,   rl  .revenue  FROM  rl ,   r2 

WHERE  rl. currency  =  'USD'  AND    rl.cname  =   r2.cnaine 

AND  rl. revenue  >  r2. expenses; 

UNION 

SELECT  rl.cname,  rl. revenue  *  1000  *  r3. rate  FROM  rl,  r2,  r3 

WHERE  rl. currency  =  'JPY'  AND  rl.cname  =  r2.cname 

AND  rS.fromCur  =  rl. currency  AND  rS.toCur  =  'USD' 

AND  rl. revenue  *  1000  *  rS.rate  >  r2. expenses 

UNION 

SELECT  rl.cname,   rl.  revenue  *  rS.rate  FROM  rl ,   r2,   r3 

WHERE  rl. currency  <>  'USD'  AND    rl. currency  <>  'JPY' 

AND  r3.fromCur  =  rl  .currency  AND  rS.toCur  =  'USD' 

AND  rl.cname  =  r2.cname  AND  rl.revenue  *  r3. rate  >  r2. expenses; 

This  mediated  query  considers  all  potential  conflicts  between  relations  rl  and  r2  when  com- 
paring values  of  "revenue"  and  "expenses"  as  reported  in  the  two  different  contexts.  Moreover, 
the  answers  returned  may  be  further  transformed  so  that  they  conform  to  the  context  of  the 
receiver.  Thus  in  our  example,  the  revenue  of  NTT  will  be  reported  as  9  600  000  as  opposed  to 
1  000  000.   More  specifically,  the  three-part  query  shown  above  can  be  understood  as  follows. 
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The  first  subquery  takes  care  of  tuples  for  which  revenue  is  reported  in  USD  using  scale-factor 
1;  in  this  case,  there  is  no  conflict.  The  second  subquery  handles  tuples  for  which  revenue  is 
reported  in  JPY,  implying  a  scale-factor  of  1000.  Finally,  the  last  subquery  considers  the  case 
where  the  currency  is  neither  JPY  nor  USD,  in  which  case  only  currency  conversion  is  needed. 
Conversion  among  different  currencies  is  aided  by  the  ancillary  data  source  r3  which  provides 
currency  conversion  rates.  This  second  query,  when  executed,  returns  the  "correct"  answer 
consisting  only  of  the  tuple  <  'NTT' ,    9  600 000>. 

Support  for  Views 

In  the  preceding  example,  the  query  Ql  is  formulated  directly  on  the  export  schema  for  the 
various  sources.  While  this  provides  a  great  deal  of  flexibility,  it  also  requires  users  to  know 
what  data  are  present  where  and  be  sufficiently  familiar  with  the  attributes  in  different  schemas 
(so  as  to  construct  a  query).  A  simple  and  yet  effective  solution  to  these  problems  is  to  allow 
views  to  be  defined  on  the  source  schemas  and  have  users  fornmlate  queries  based  on  the  view 
instead.  For  example,  we  might  define  a  view  on  relations  rl  and  r2,  given  by 

CREATE  VIEW  vl    (cname,   profit)    AS 

SELECT  rl. cname,   rl. revenue   -   r2. expenses 

FROM  rl,    r2 

WHERE  rl. cname  =  r2. cname; 

In  which  case,  query  Ql  can  be  equivalently  formulated  on  the  view  vl  as 

VQl:  SELECT  cname,    profit  FROM  vl 

WHERE  profit   >  0; 

While  achieving  essentially  the  same  functionalities  as  tightly-coupled  systems,  notice  that 
view  definitions  in  our  case  are  no  longer  concerned  with  semantic  heterogeneity  and  mcike  no 
attempts  at  identifying  or  resolving  conflicts.  In  fact,  any  query  on  a  view  (say,  VQl  on  vl) 
can  be  trivially  rewritten  to  a  query  on  the  source  schema  (e.g.,  Ql).  This  means  that  query 
mediation  can  be  undertaken  by  the  Context  Mediator  as  before. 

Knowledge-Level  versus  Data-Level  Queries 

Instead  of  inquiring  about  stored  data,  it  is  sometimes  useful  to  be  able  to  query  the  semantics 
of  data  which  are  implicit  in  different  systems.  Consider,  for  instance,  the  query  based  on  a 
superset  of  SQL*^: 


Sciore  et  aJ.  [43]  have  described  a  similar  (but  not  identical)  extension  of  SQL  in  which  context  is  treated 
as  a  "first-class  object".  We  are  not  concern  with  the  exact  syntax  of  such  a  language  here;  the  issue  at  hand  is 


Q2:  SELECT  rl.cname,   rl .revenue. scaleFactor  IN  cl, 

rl. revenue. scaleFactor  IN  c2  FROM  rl 
WHERE  rl .revenue. scaleFactor  IN  cl  <>  rl .revenue. scaleFactor  IN  c2; 

Intuitively,  this  query  asks  for  companies  for  which  scale- factors  for  reporting  "revenue"  in  rl 
(in  context  cl)  differ  from  that  which  the  user  assumes  (in  context  c2).  We  refer  to  queries 
such  as  Q2  as  knowledge-level  queries,  as  opposed  to  data-level  queries  which  are  enquires  on 
factual  data  present  in  data  sources.  Knowledge-level  queries  have  received  little  attention  in 
the  database  literature  and  certainly  have  not  been  addressed  by  the  data  integration  commu- 
nity. This,  in  our  opinion,  is  a  significant  gap  since  heterogeneity  in  disparate  data  sources 
arises  primarily  from  incompatible  assumptions  about  how  data  are  interpreted.  Our  ability 
to  integrate  access  to  both  data  and  semantics  can  be  exploited  by  users  to  gain  insights  into 
differences  among  particular  systems  ("Do  sources  A  and  B  report  a  piece  of  data  differently? 
If  so,  how?"),  or  by  a  query  optimizer  which  may  want  to  identify  sites  with  minimal  conflicting 
interpretations  (to  minimize  costs  associated  with  data  transformations). 

Interestingly,  knowledge-level  queries  can  be  answered  using  the  exact  same  inference  mech- 
anism for  mediating  data-level  queries.  Hence,  submitting  query  Q2  to  the  Context  Mediator 
will  yield  the  result: 

MQ2:     SELECT  rl.cname,  1000,  1  FROM  rl 
WHERE  rl. currency  =  'JPY'; 

which  indicates  that  the  answer  consists  of  companies  for  which  the  currency  attribute  has 
value  'JPY',  in  which  case  the  scale-factors  in  context  cl  and  c2  are  1000  and  1  respectively. 
If  desired,  the  mediated  query  MQ2  can  be  evaluated  on  the  extensional  data  set  to  return  an 
answer  grounded  in  actual  data  elements.  Hence,  if  MQ2  is  evaluated  on  the  data  set  shown  in 
Figure  1,  we  would  obtain  the  singleton  answer  <'NTT',  1000,  1>. 

Extensional  versus  Intensional  Answers 

Yet  another  feature  of  Context  Interchange  is  that  answers  to  queries  can  be  both  intensional 
and  extensional.  Extensional  answers  correspond  to  fact-sets  which  one  normally  expects  of 
a  database  retrieval.  Intensional  answers,  on  the  other  hand,  provide  only  a  characterization 
of  the  extensional  answers  without  actually  retrieving  data  from  the  data  sources.  In  the 
preceding  example,  MQ2  can  in  fact  be  understood  as  an  intensional  answer  for  Q2,  while  the 
tuple  obtained  by  the  evaluation  of  MQ2  constitutes  the  extensional  answer  for  Q2. 


how  we  might  support  the  underlying  inferences  needed  to  answer  such  queries. 
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In  the  COIN  framework,  intensional  answers  are  grounded  in  extensional  predicates  (i.e., 
names  of  relations),  evaluable  predicates  (e.g.,  arithmetic  operators  or  "relational"  operators), 
and  external  functions  which  can  be  directly  evaluated  through  system  calls.  The  intensional 
answer  is  thus  no  different  from  a  query  which  can  normally  be  evaluated  on  a  conventional 
query  subsystem  of  a  DBMS.  Query  answering  in  a  Context  Interchange  system  is  thus  a  two- 
step  process:  an  intensional  answer  is  first  returned  in  response  to  a  user  query;  this  can  then 
be  executed  on  a  conventional  query  subsystem  to  obtain  the  extensional  answer. 

The  intermediary  intensional  answer  serves  a  number  of  purposes  [24].  Conceptually,  it 
constitutes  the  mediated  query  corresponding  to  the  user  query  and  can  be  used  to  confirm  the 
user's  understanding  of  what  the  query  actually  entails.  More  often  than  not,  the  intensional 
answer  can  be  more  informative  and  easier  to  comprehend  compared  to  the  extensional  answer 
it  derives.  (For  example,  the  intensional  answer  MQ2  actually  conveys  more  information  than 
merely  returning  the  single  tuple  satisfying  the  query.)  From  an  operational  standpoint,  the 
computation  of  extensional  answers  are  likely  to  be  many  orders  of  magnitude  more  expensive 
compared  to  the  evaluation  of  the  corresponding  intensional  answer.  It  therefore  makes  good 
sense  not  to  continue  with  query  evaluation  if  the  intensional  answer  satisfies  the  user.  From  a 
practical  standpoint,  this  two-stage  process  allows  us  to  separate  query  mediation  from  query 
optimization  and  execution.  As  we  will  illustrate  later  in  this  paper,  query  mediation  is  driven 
by  logical  inferences  which  do  not  bond  well  with  the  (predominantly  cost-based)  optimization 
techniques  that  have  been  developed  [40;  45].  The  advantage  of  keeping  the  two  tasks  apart  is 
thus  not  merely  a  conceptual  convenience,  but  allows  us  to  take  advantage  of  mature  techniques 
for  query  optimization  in  determining  how  best  a  query  can  be  evaluated. 

Query  Pruning 

Finally,  observe  that  consistency  checking  is  performed  as  an  integral  activity  of  the  mediation 
process,  allowing  intensional  answers  to  be  pruned  (in  some  cases,  significantly)  to  aurive  at 
answers  which  are  better  comprehensible  and  more  efficient.  For  example,  if  Ql  had  been 
modified  to  include  the  additional  condition  "rl. currency  =  '  JPY'",  the  intensional  answer 
returned  (MQl)  would  have  only  the  second  SELECT  statement  (but  not  the  first  and  the 
third)  since  the  other  two  would  have  been  inconsistent  with  the  newly  imposed  condition. 
This  pruning  of  the  intensional  answer,  accomplished  by  taking  into  consideration  integrity 
constraints  (present  as  part  of  a  query,  or  those  defined  on  sources)  and  knowledge  of  data 
semantics  in  distinct  systems,  constitutes  a  form  of  semantic  query  optimization  [7].  Consistency 
checking  however  can  be  an  expensive  operation  and  the  gains  from  a  more  efficient  execution 
must  be  balanced  against  the  cost  of  performing  the  consistency  check  during  query  mediation. 
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In  our  case,  however,  the  benefits  are  ampUfied  since  spurious  conflicts  that  remain  undetected 
could  result  in  an  additional  conjunctive  query  involving  multiple  sources. 

2.2     Functional  Features  of  Context  Interchange:  A  System  Perspective 

It  is  natural  to  assume  the  internal  complexity  of  any  system  will  increase  in  commensuration 
with  the  external  functionalities  it  offers.  The  Context  Interchange  system  is  no  exception. 
We  make  no  claim  that  our  approach  is  "simple";  however,  we  submit  that  this  complexity 
is  decomposable  and  well-founded.  Decomposability  has  obvious  benefits  from  a  system  engi- 
neering perspective,  allowing  complexity  to  be  harnessed  into  small  chunks,  thus  making  our 
integration  approach  more  endurable,  even  when  the  number  of  sources  and  receivers  are  ex- 
ponentially large  and  when  changes  are  rampant.  The  complexity  is  said  to  be  well-founded 
because  it  is  possible  to  characterize  the  behavior  of  the  system  in  an  abstract  mathematical 
framework.  This  allows  us  to  understand  the  potential  (or  limits)  of  the  strategy  apart  from 
the  idiosyncrasies  of  the  implementation,  and  is  useful  for  providing  insights  into  where  and 
how  improvements  can  be  made.  We  describe  below  some  of  the  ways  in  which  complexity 
is  decoupled  in  a  Context  Interchange  system,  as  well  as  tangible  benefits  which  result  from 
formalization  of  the  integration  strategy. 

Representation  of  'Meaning'  as  opposed  to  Conflicts 

As  mentioned  earlier,  a  key  insight  of  the  Context  Interchange  strategy  is  that  we  can  represent 
the  meaning  of  data  in  the  underlying  sources  and  receivers  without  identifying  and  reconciling 
all  potential  conflicts  which  exist  between  any  two  systems.  Thus,  query  mediation  can  be  per- 
formed dynamically  (as  when  a  query  is  submitted)  or  it  can  be  used  to  produce  a  query  plan 
(the  mediated  query)  that  constitutes  a  locally-defined  view.  In  the  latter  case,  this  view  is  sim- 
ilar to  the  shared  schemas  in  tightly-coupled  systems  with  one  important  exception:  whenever 
changes  do  occur  (say,  when  the  semantics  of  data  encoded  in  some  context  is  changed^),  the 
Context  Mediator  can  be  triggered  to  reconstruct  the  local  view  automatically.  Unlike  the  case 
in  tightly-coupled  systems,  this  reconstruction  requires  no  manual  intervention  from  the  system 
administrators.  In  both  of  the  above  scenarios,  changes  in  local  systems  are  well-contained  and 
do  not  mandate  human  intervention  in  other  parts  of  the  larger  system.  This  represents  a 
significant  gain  over  tightly-coupled  systems  where  maintenance  of  shared  schemas  constitute 
a  major  system  bottleneck. 


*For  an  account  of  why  this  seemingly  strange  phenomenon  may  be  more  common  than  is  widely  believed  to 
be,  see  [52]. 
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Contexts  versus  Schemas 

Unlike  the  semantic-value  model,  the  COIN  data  model  adopts  a  very  different  conceptualization 
of  contexts:  instead  of  a  property-list  associated  with  individual  data  elements,  we  view  context 
as  consisting  of  a  collection  of  axioms  which  describes  a  particular  "situation"  a  source  or 
receiver  is  in. 
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Figure  2:  A  graphical  representation  of  the  relationships  between  semantic-types  in  the  domain 
model,  semantic-relations  (defined  on  semantic-objects),  and  data  elements  in  the  relation  r2. 


Figure  2  provides  a  graphical  representation  of  the  salient  features  of  the  structure  which 
is  the  key  enabler  for  this  representation.  The  domain  model  presents  the  definitions  for  the 
"types"  of  information  units  (called  semantic- types)  that  constitute  a  common  vocabulary  for 
capturing  the  semantics  of  data  in  disparate  systems.  Instances  of  semantic-types  are  called 
semantic-objects.  Every  data  element  in  a  source  or  receiver  is  mapped  to  a  unique  semantic- 
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object  for  which  the  object-id  is  a  Skolem  function  [8]  defined  on  some  key  values.  For  each 
relation  r  present  in  a  source,  there  is  an  isomorphic  relation  r'  defined  on  the  corresponding 
semantic-objects.  Among  other  features,  this  structure  allows  us  to  encode  assumptions  of  the 
underlying  context  independently  of  structure  imposed  on  data  in  underlying  sources  (i.e.,  the 
schemes).  For  example,  the  constraint:  all  "company  financials"  are  reported  in  US  Dollars 
within  context  C2  can  be  described  using  the  clause^ 

X'-.companyFinancials,  currency {c2,  X'):currencyType  h 
curre/Jcy(c2,X')[va/ue(c2)  ->'USD']. 

The  method  currency  is  said  to  be  a  modifier  because  it  modifies  how  the  value  of  a  semantic- 
object  is  reported.  A  detailed  presentation  of  the  details  of  this  framework  will  be  presented 
later  in  Section  4. 

The  dichoton  y  between  schemas  and  contexts  present  a  number  of  opportunities  for  sys- 
tematic sharing  and  reuse  of  semantic  encodings.  For  example,  different  sources  and  receivers 
in  the  same  context  may  now  bind  to  the  same  set  of  context  axioms;  and  distinct  attributes 
which  correlate  with  one  another  (e.g.,  revenue,  expenses)  may  be  mapped  to  instances  of 
the  same  semantic-type  (e.g.,  companyFinancials).  These  circumvent  the  need  to  define  a  new 
property-list  for  all  attributes  in  each  schema.  Unlike  the  semantic  value  model,  there  is  no 
ambiguity  on  what  modifiers  can  be  introduced  as  "meta-attributes"  since  all  properties  of 
semantic- types  are  explicitly  defined  in  the  domain  model.  As  pointed  out  in  the  previous 
section,  views  can  be  (more  simply)  defined  on  the  extensional  relation  independently  of  the 
context  descriptions.  This  constitutes  yet  another  benefit  of  teasing  apart  the  structure  of  a 
source,  and  semantics  which  are  implicit  in  its  context. 

Inheritance  and  Overriding  in  Semantic- Types 

Not  surprisingly,  the  various  features  frequently  associated  with  "object-orientation"  are  useful 
in  our  representation  scheme  as  well.  Semantic-types  fall  naturally  into  a  generalization  hier- 
archy, which  allow  us  to  take  advantage  of  structural  and  behavioral  inheritance  in  achieving 
economy  of  expression.  Structural  inheritance  allows  a  semantic-type  to  inherit  the  declara- 
tions defined  for  its  supertypes.  For  example,  the  semantic-type  companyFinancials  inherits 
from  moneyAmt  the  declarations  concerning  the  existence  of  the  "methods"  currency  and 
scaleFactor.  Behavioral  inheritance  allows  the  definitions  of  these  methods  to  be  inherited  as 
well.    Hence,  if  we  had  defined  earlier  that  instances  of  moneyAmt  has  a  scale- factor  of  1, 

®The  syntcLx  of  this  language  is  defined  in  Section  4  and  corresponds  to  that  of  Gulog  [15]. 
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all  instances  of  companyFinancials  would  inherit  the  same  scale-factor  since  every  instance  of 
companyFinancials  is  an  instance  of  moneyAmt. 

Inheritance  need  not  be  monotonic.  Non-monotonic  inheritance  means  that  the  declaration 
or  definition  of  a  method  can  be  overridden  in  a  subtype.  Thus,  inherited  definitions  can  be 
viewed  cis  defaults  which  can  always  be  changed  to  reflect  the  specificities  at  hand. 

Value  Conversion  Among  Different  Contexts 

Yet  another  benefit  of  adopting  an  object-oriented  model  in  our  framework  is  that  it  allows 
conversion  functions  on  values  to  be  explicitly  defined  in  the  form  of  methods  defined  on  various 
semantic  types.  For  example,  the  conversion  function  for  converting  an  instance  of  moneyAmt 
from  one  scale-factor  (F  in  context  C)  to  another  (Fi  in  context  Ci)  can  be  defined  as  follows: 

X'  -.moneyAmt  h 

X'[cvt(ci)@scaleFactor,  C,  U^V]  <r-  X'[scaleFactor(ci)^-[value{ci)^Fi]], 

X'[sca}eFactor{C)^.[vahieici)^F]],  V  =  U*  F/F^. 

This  conversion  function,  unless  explicitly  overridden,  will  be  invoked  whenever  there  is  a 
request  for  scale-factor  conversion  on  an  object  which  is  an  instance  of  moneyAmt  and  when 
the  conversion  is  to  be  performed  with  reference  to  context  C\.  Overriding  can  take  place  along 
the  generalization  hierarchy:  as  before,  we  may  introduce  a  different  conversion  function  for  a 
subtype  of  moneyAmt.  Notice  that  this  conversion  function  is  defined  with  reference  to  context 
Ci  only:  in  order  for  scale-factor  conversion  to  take  place  in  a  different  context,  the  conversion 
function  (which  could  be  identical  to  the  one  in  ci ,  or  not)  will  have  to  be  defined  explicitly.  This 
phenomenon  allows  diff"erent  conversion  functions  to  be  associated  with  different  contexts  and 
is  a  powerful  mechanism  for  different  users  to  introduce  their  own  interpretations  of  disparate 
data  in  a  localized  fashion.  The  apparent  redundancy  (in  having  multiple  instances  of  the  same 
definition  in  diff'erent  contexts)  is  addressed  through  the  adoption  of  a  context  hierarchy  which 
is  described  next. 

Hierairchical  composition  of  Contexts 

By  "objectifying"  sets  of  axioms  associated  with  contexts,  we  can  introduce  a  hierarchical 
relationship  among  contexts.  If  c  is  a  subcontext  of  c',  then  all  the  axioms  defined  in  (/  are 
said  to  apply  in  c  unless  they  are  "overridden".  An  immediate  application  of  this  concept 
is  to  make  all  "functional"  contexts  subcontexts  of  a  default  context  Cq,  which  contains  the 
default  declarations  and  method  definitions.  Under  this  scheme,  new  contexts  introduced  need 
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only  to  identify  how  it  is  different  from  the  default  context  and  introduce  the  declarations  and 
method  definitions  which  need  to  be  changed  (overridden).  This  is  formulated  as  a  meta-logical 
extension  of  the  COIN  framework  and  will  be  described  in  further  details  in  Section  6. 

Another  advantage  of  having  this  hierarchy  of  context  is  the  ability  to  introduce  changes  to 
the  domain  model  in  an  incremental  fashion  without  having  adverse  effects  on  existing  systems. 
For  example,  suppose  we  need  to  add  a  new  source  for  which  currency  units  take  on  a  different 
representation  (e.g.,  'Japanese  Yen'  versus  ' JPY').  This  distinction  has  not  been  previously 
captured  in  our  domain  model,  which  has  hitherto  assumed  currency  units  have  a  homogeneous 
representation.  To  accommodate  the  new  data  source,  it  is  necesscU-y  to  add  a  new  modifier  for 
currencyType,  say  format,  in  the  domain  model: 

cuTrencyType[forinat(ctx)  =>  semanticString]. 

Rather  than  making  changes  to  all  existing  contexts,  we  can  assign  a  default  value  to  this 
modifier  in  cq,  and  at  the  same  time,  introduce  a  conversion  function  for  mapping  between 
currency  representations  of  different  formats  (e.g.,  'Japanese  Yen'  and  'JPY'): 

X: currencyType,  format{co,  X):semanticString  h 

format(co,  A')[vaJue(co)— )■  'abbreviated']. 
X -.currencyType  h  X[cvt(co)@C,  U^V]  <r-  . . .  (body) . . . 

The  last  step  in  this  process  is  to  add  to  the  new  context  (cf)  the  following  context  axiom: 

X -.currencyType,  format{c' , X) .semanticString  h  format(c',X)[vaJue(c')->  'full']. 

which  distinguishes  it  as  having  a  different  format. 

3     Context  Interchange  vis-a-vis  Traditional  and  Contemporary 
Integration  Approaches 

In  the  preceding  section,  we  have  made  detailed  comments  on  the  many  features  that  the 
Context  Interchange  approach  has  over  traditional  loose-  and  tight-coupling  approaches.  In 
summary,  although  tightly-coupled  systems  may  provide  better  support  for  data  access  to 
heterogeneous  systems  (compared  to  loosely-coupled  systems),  they  do  not  scale-up  effectively 
given  the  complexity  involved  in  constructing  a  shared  schema  for  a  large  number  of  systems 
and  are  generally  unresponsive  to  changes  for  the  same  reason.  Loosely-coupled  systems,  on  the 
other  hand,  require  little  central  administration  but  are  equally  non-viable  since  they  require 
users  to  have  intimate  knowledge  of  the  data  sources  being  accessed;  this  assumption  is  generally 
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non-tenable  when  the  number  of  systems  involved  is  large  and  when  changes  are  frequent^°. 
The  Context  Interchange  approach  provides  a  novel  middle  ground  between  the  two:  it  allows 
queries  to  sources  to  be  mediated  in  a  transparent  manner,  provides  systematic  support  for 
elucidating  the  semantics  of  data  in  disparate  sources  and  receivers,  and  at  the  same  time,  does 
not  succumb  to  the  complexities  inherent  in  maintenance  of  shared  schemas. 

At  a  cursory  level,  the  Context  Interchange  approach  may  appear  similar  to  many  contem- 
porary integration  approaches.  Examples  of  these  commonalities  include: 

•  framing  the  problem  in  McCarthy's  theory  of  contexts  [37]  (as  in  Carnot  [11],  and  more 
recently,  [18]); 

•  encapsulation  [3]  of  semantic  knowledge  in  a  hierarchy  of  rich  data  types  which  are  refined 
via  sub-typing  (as  in  several  object-oriented  multidatabase  systems,  the  archetype  of  which 
is  Pegasus  [2]); 

•  adoption  of  a  deductive  or  object-oriented  formalism  [28;  15]  (as  in  the  ECRC  Multi- 
database  System  [26]  and  DISCO  [50]); 

•  provision  of  value-added  services  through  the  use  of  mediators  [54]  (as  in  TSIMMIS  [20]); 

We  posit  that  despite  these  superficial  similarities,  our  approach  represents  a  radical  departure 
from  these  contemporary  integration  strategies. 

To  begin  with,  a  number  of  contemporary  integration  approaches  are  in  fact  attempts  aimed 
at  rejuvenating  the  loose-  or  tight-coupling  approach.  These  are  often  characterized  by  the 
adoption  of  an  object-oriented  formalism  which  provides  support  for  more  effective  data  trans- 
formation (e.g.,  0*SQL  [32])  or  to  mitigate  the  effects  of  complexity  in  schema  creation  and 
change  management  through  the  use  of  abstraction  and  encapsulation  mechanisms.  To  some 
extent,  contemporary  approaches  such  as  Pegasus  [2],  the  ECRC  Multidatabase  Project  [26], 
and  DISCO  [50]  can  be  seen  as  examples  of  the  latter  strategy.  These  differ  from  the  Con- 
text Interchange  strategy  since  they  continue  to  rely  on  human  intervention  in  reconciling 
conflicts  a  priori  and  in  the  maintenance  of  shared  schemas.  Yet  another  difference  is  that  al- 
though a  deductive  object-oriented  formalism  is  also  used  in  the  Context  Interchange  approach, 
"semantic-objects"  in  our  case  exist  only  conceptually  and  are  never  actually  materialized.  One 
implication  of  this  is  that  mediated  queries  obtained  from  the  Context  Mediator  can  be  further 


We  have  drawn  a  sharp  distinction  between  the  two  here  to  provide  a  contrast  of  their  relative  features.  In 
practice,  one  is  most  likely  to  encounter  a  hybrid  of  the  two  strategies.  It  should  however  be  noted  that  the  two 
strategies  are  incongruent  in  their  outlook  and  are  not  able  to  easily  take  advantage  of  each  other's  resources. 
For  instance,  data  semantics  encapsulated  in  a  shared  .schema  cannot  be  ea-sily  extracted  by  a  u.ser  to  assist  in 
formulating  a  query  which  seeks  to  reference  the  source  schemas  directly. 
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optimized  using  traditional  query  optimizers  or  be  executed  by  the  query  subsystem  of  classical 
(relational)  query  subsystems  without  changes. 

In  the  Carnot  system  [11],  semantic  interoperability  is  accomplished  by  writing  articulation 
axioms  which  translate  "statements"  which  are  true  in  individual  sources  to  statements  which 
are  meaningful  in  the  Cyc  knowledge  base  [31].  A  similar  approach  is  adopted  in  [18],  where 
it  is  suggested  that  domain-specific  ontologies  [22],  which  may  provide  additional  leverage  by 
allowing  the  ontologies  to  be  shared  and  reused,  can  be  used  in  place  of  Cyc.  While  we  like 
the  explicit  treatment  of  contexts  in  these  efforts  and  share  their  concern  for  sustaining  an 
infrastructure  for  data  integration,  our  realization  of  these  differ  significantly.  First,  lifting 
axioms  [23]  in  our  case  operate  at  a  finer  level  of  granularity:  rather  than  writing  cixioms  which 
map  "statements"  present  in  a  data  source  to  a  common  knowledge  base,  they  are  used  for 
describing  "properties"  of  individual  "data  objects".  Second,  instead  of  having  an  "ontology" 
which  captures  aF.  structural  relationships  among  data  objects  (much  like  a  "global  schema"), 
we  have  a  domain  model  which  is  a  much  less  elaborate  collection  of  complex  semantic-types. 
These  differences  account  largely  for  the  scalability  and  extensibility  of  our  approach. 

Finally,  we  remark  that  the  TSIMMIS  [41;  42]  approach  stems  from  the  premise  that  in- 
formation integration  could  not,  and  should  not,  be  fully  automated.  With  this  in  mind, 
TSIMMIS  opted  in  favor  of  providing  both  a  framework  and  a  collection  of  tools  to  assist  hu- 
mans in  their  information  processing  and  integration  activities.  This  motivated  the  invention 
of  a  "light-weight"  object  model  which  is  intended  to  be  self-describing.  For  practical  purposes, 
this  translates  to  the  strategy  of  making  sure  that  attribute  labels  are  as  descriptive  as  possible 
and  opting  for  free-text  descriptions  ("man-pages")  which  provide  elaborations  on  the  seman- 
tics of  information  encapsulated  in  each  object.  We  concur  that  this  approach  may  be  effective 
when  the  data  sources  are  ill-structured  and  when  consensus  on  a  shared  vocabulary  cannot  be 
achieved.  However,  there  are  also  many  other  situations  (e.g.,  where  data  sources  are  relatively 
well-structured  and  where  some  consensus  can  be  reached)  where  human  intervention  is  not 
appropriate  or  necessary:  this  distinction  is  primarily  responsible  for  the  different  approaches 
taken  in  TSIMMIS  and  our  strategy. 

4     The  COIN  Data  Model:    Structure,  Language,  and  Frame- 
work 

In  [37],  McCarthy  pointed  out  that  statements  about  the  world  are  never  always  true  or  false: 
the  truth  or  falsity  of  a  statement  can  only  be  understood  with  reference  to  a  given  context. 
This  is  formalized  using  assertions  of  the  form: 
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c:  ist{c,a) 

which  suggests  that  the  statement  a  is  true  ("«<")  in  the  context  c^\  this  statement  itself  being 
asserted  in  an  outer  context  c.  Lifting  axioms^^  are  used  to  describe  the  relationship  between 
statements  in  different  contexts.  These  statements  are  of  the  form 

c:  ist{c,a)  <=>  ist{c',a') 

which  suggests  that  "cr  in  c  states  the  same  thing  as  a'  in  c"'. 

McCarthy's  notion  of  "contexts"  and  "lifting  axioms"  provide  a  useful  framework  for  mod- 
eling statements  in  heterogeneous  databases  which  are  seemingly  in  conflict  with  one  another. 
From  this  perspective,  factual  statements  present  in  a  data  source  are  no  longer  "universal" 
facts  about  the  world,  but  are  true  relative  to  the  context  associated  with  the  source  but  not 
necessarily  so  in  a  different  context.  Thus,  if  we  assign  the  labels  c\  and  C2  to  contexts  associated 
with  sources  1  and  2  in  Figure  1,  we  may  now  write: 

c:       ist(ci,  rl('NTT',  1000  000,  'JPY'))- 
c:       «s<(c2,  r2(' NTT',  5  000  000)). 

The  context  c  above  refers  to  the  ubiquitous  context  in  which  our  discourse  is  conducted  (i.e., 
the  integration  context)  and  may  be  omitted  in  the  ensuing  discussion  whenever  there  is  no 
ambiguity. 

In  the  Context  Interchange  approach,  the  semantics  of  data  are  captured  explicitly  in  a 
collection  of  statements  asserted  in  the  context  associated  with  each  source,  while  allowing 
conflict  detection  and  reconciliation  to  be  deferred  to  the  time  when  a  query  is  submitted. 
Building  on  the  ideas  developed  in  [48;  44],  we  would  like  to  be  able  to  represent  the  semantics 
of  data  at  the  level  of  individual  data  elements  (as  opposed  to  the  predicate  or  sentential  level), 
which  allows  us  to  identify  and  deal  with  conflicts  at  a  finer  level  of  granularity.  Unfortunately, 
individual  data  elements  may  be  present  in  a  relation  without  a  unique  denotation.  For  instance, 
the  value  1  000  000  in  relation  rl  (as  shown  in  Figure  1)  simultaneously  describes  the  revenue  of 
IBM  and  NTT  while  being  reported  in  different  currencies  and  scale-factors.  Thus,  the  statements 


"in  the  words  of  Guha  [23],  contexts  represents  "the  reification  of  the  context  dependencies  of  the  sentences 
associated  with  the  context."  They  are  said  to  be  "rich-objects"  in  that  "they  cannot  be  defined  or  completely 
described"  [38].  Consider,  for  instance,  the  context  associated  with  the  statement;  "There  are  fovir  billion  people 
hving  on  Earth".  To  fully  qualify  the  sentence,  we  might  add  that  it  assumes  that  the  time  is  1991,  However, 
this  certainly  is  not  the  only  relevant  assumption  in  the  underlying  context,  since  there  are  implicit  assumptions 
about  who  is  considered  a  "live  person"  (are  fetuses  in  the  womb  alive?),  or  what  it  means  to  be  "on  earth" 
(does  it  include  people  who  are  in  orbit  around  the  earth?) 
aJso  called  articulation  axioms  in  Cyc/Carnot  [11]. 
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ist{ci,  currency(R,Y)  f-  t1{N,R,Y)). 

ist{ci,  scaIeFactor{R,  1000)  <-  currency(R,Y),  F  ='JPY'). 

ist{cu  scaleFactor{R,  \)  <-  currency {R,Y),  Y  t^'JPY')- 

intending  to  represent  the  currencies  and  scale-factors  of  revenue  amounts  will  result  in  multiple 
inconsistent  values.  To  circumvent  this  problem,  we  introduce  semantic-objects,  which  can 
be  referenced  unambiguously  through  their  object-ids.  Semantic-objects  are  complex  terms 
constructed  from  the  corresponding  data  values  (also  called  primitive- objects)  and  are  used  as 
a  basis  for  inferring  about  conflicts,  but  are  never  materialized  in  an  object-store.  This  will  be 
described  in  further  details  in  the  next  section. 

The  data  model  underlying  our  integration  approach,  called  COIN  (for  context  INterchange), 
consists  of  both  a  structural  component  describing  how  data  objects  are  organized,  and  a  lan- 
guage which  provides  the  basis  for  niciking  formal  assertions  and  inferences  about  a  universe 
of  discourse.  In  the  remainder  of  this  section,  we  provide  a  description  of  both  of  these  com- 
ponents, followed  by  a  formal  characterization  of  the  Context  Interchange  strategy  in  the  form 
of  a  COIN  framework.  The  latter  will  be  illustrated  with  reference  to  the  motivational  example 
introduced  in  Section  2. 

4.1     The  Structural  Elements  of  COIN 

The  COIN  data  model  is  a  deductive  object-oriented  data  model  designed  to  provide  explicit 
support  for  Context  Interchange.  Consistent  with  object- orientation  [3],  information  units  are 
modeled  as  objects,  having  unique  and  immutable  object-ids  (oids),  and  corresponding  to  types 
in  a  generalization  hierarchy  with  provision  for  non-monotonic  inheritance.  We  distinguish 
between  two  kinds  of  data  objects  in  coin:  primitive  objects,  which  are  instances  of  primitive 
types,  and  semantic-objects  which  are  instances  of  semantic-types.  Objects  in  COIN  have  both  an 
oid  and  a  value:  these  are  identical  in  the  case  of  primitive-objects,  but  different  for  semantic- 
objects.  This  is  an  important  distinction  which  will  become  apparent  shortly. 

Primitive-types  correspond  to  data  types  (e.g.,  strings,  integers,  and  reals)  which  are  native 
to  sources  and  receivers.  Semantic-types,  on  the  other  hand,  are  complex  types  introduced  to 
support  the  underlying  integration  strategy.  Specifically,  semantic-objects  may  have  proper- 
ties which  are  either  attributes  or  modifiers.  Attributes  represent  structural  properties  of  the 
semantic-object  under  investigation:  for  instance,  an  object  of  the  semantic-type  companyFi- 
nancials  must,  by  definition,  describes  some  company;  we  capture  this  structural  dependency 
by  defining  the  attribute  company  for  the  semantic-type  companyFinancials.  Modifiers,  on  the 
other  hand,  Eire  used  as  the  basis  for  capturing  "orthogonal"  sources  of  variations  concerning 
how  the  value  of  a  semantic-object  may  be  interpreted.  Consider  the  semantic-type  money  Ami: 
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the  modifiers  currency  and  scaleFactor  defined  for  money  Ami  suggests  two  sources  of  variations 
in  how  the  value  corresponding  to  an  instance  of  moneyAmt  may  be  interpreted.  "Orthogonal- 
ity" here  refers  to  the  fact  that  the  value  which  can  be  assigned  to  one  modifier  is  independent 
of  other  modifiers,  as  is  the  case  with  scaleFactor  and  ciu-rency.  This  is  not  a  limitation  on  the 
expressiveness  of  the  model  since  two  sources  of  variations  which  are  correlated  can  always  be 
modeled  as  a  single  modifier.  As  we  shall  see  later,  this  simplification  allows  greater  flexibility 
in  dealing  with  conversions  of  values  across  diff"erent  contexts. 

Unlike  primitive-objects,  the  value  of  a  semantic-object  may  be  different  in  different  con- 
texts. For  example,  if  the  (Skolem)  term  sko  is  the  oid  for  the  object  representing  the  revenue 
of  NTT,  it  is  perfectly  legitimate  for  both 

(1)  ist{ci,  value{skQ,1000  000));  and 

(2)  ist{c2,  value(sko,9  600  000)), 

to  be  true  since  contexts  c\  and  C2  embody  different  assumptions  on  what  currencies  and  scale- 
factors  are  used  to  report  the  value  of  a  revenue  amount^''.  For  our  problem  domain,  it  is  often 
the  case  that  the  value  of  a  semantic-object  is  known  in  some  context,  but  not  others.  This  is 
the  case  in  the  example  above,  where  (1)  is  known,  but  not  (2).  The  derivation  of  (2)  is  aided 
by  a  special  lifting  axiom  defined  below. 

Definition  1  Let  t  be  an  oid-term  corresponding  to  a  semantic-object  of  the  semantic-type  r, 
and  suppose  the  value  of  t  is  given  in  context  Cg.  For  any  context  represented  by  C,  we  have 

ist{C,  value{t,X)  <-  fcvt{t,Cs,X')  ^  X)  <^  ist{cs,  value{t,X')). 

We  refer  to  /cvt,  as  the  conversion  function  for  t  in  context  C,  and  say  that  X  is  the  value  of  t 
in  context  C,  and  that  it  is  derived  from  context  Cs-  Q 

As  we  shall  see  later,  the  conversion  function  referenced  above  is  polymorphically  defined,  being 
dependent  on  the  type  of  the  object  to  which  it  is  applied,  and  may  be  different  in  distinct 
contexts. 

Since  modifiers  of  a  semantic-type  are  orthogonal  by  definition,  the  conversion  function 
referenced  in  the  preceding  definition  can  in  fact  be  composed  from  other  simpler  conversion 
methods  defined  with  reference  to  each  modifier.  To  distinguish  between  the  two,  we  refer  to  the 
first  as  a  composite  conversion  function,  and  the  latter  as  atomic  conversion  functions.  Suppose 


A  predicate-calculu.s  language  is  used  in  the  discussion  here  since  it  provides  better  intuition  for  most  readers. 
The  COIN  language,  for  which  properties  are  modeled  a-s  "methods"  (allowing  us  to  write  sA:o[vaJue->l  000  000] 
as  opposed  to  value{skoA  000  000)),  will  be  formaJly  defined  in  Section  4.2. 
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modifiers  of  a  semantic-type  r  are  mi, . . . ,  mj^,  and  /cvt  is  a  composite  conversion  function  for 
T.  It  follows  that  if  t  is  an  object  of  type  r,  then 


{f^,'2it^c,,X')  =  X,)  A---  A  {£lit^Cs,Xk-,)  =  X) 


fcvt{t,  Cs,X')  =  X  if  3X\, . .. ,  Xk-]  such  that 

where  f^H  corresponds  to  the  atomic  conversion  function  with  respect  to  modifier  rrij.  Notice 
that  the  order  in  which  the  conversions  are  eventually  effected  need  not  correspond  to  the 
ordering  of  the  atomic  conversions  imposed  here,  since  the  actual  conversions  are  carried  out 
in  a  lazy  fashion  and  depends  on  the  propagation  of  variable  bindings. 

Finally,  we  note  that  value-based  comparisons  in  the  relational  model  requires  some  adjust- 
ments here.  We  say  that  two  semantic-objects  are  distinct  if  their  oids  are  different.  However, 
distinct  semantic-objects  may  be  semantically-equivalent  as  defined  below. 

Definition  2  Let  ©  be  a  relational  operator  from  the  set  {=,  <,>,<,>,  7^,  . .}.  If  i  and  t'  are 
oid-terms  corresponding  to  semantic-objects,  then 

(t®t')  O  (value{t,X)  A  vahie{t',X')  A  X  ®  X') 

In  particular,  we  say  that  t  and  t'  are  semantically-equivalent  in  context  c  if  ist{c,t=t').         □ 

We  sometimes  abuse  the  notation  slightly  by  allowing  primitive-objects  to  participate  in  semantic- 
comparisons.   Recall  that  we  do  not  distinguish  between  the  oid  and  the  value  of  a  primitive 
object;  thus,  ist{C,  va/ue(l  000  000,  1  000000))  is  true  irregardless  of  what  C  may  be.  Suppose 
we  know  that  ist{Ci,  value{skQ,  1000  000)),  where  sko  refers  to  the  revenue  of  NTT  as  before. 
The  expression 

sko  <  5  000  000 

will  therefore  evaluate  to  "true"  in  context  c\  but  not  context  C2,  since  ist{c2,  value{sko, 
9  600  000)).  This  latter  fact  can  be  derived  from  the  value  of  sko  in  ci  (which  is  reported 
a  priori  in  rj  and  the  conversion  function  associated  with  the  type  companyFinancials  (see 
Section  5.3). 

4.2     The  Language  of  COIN 

We  describe  in  this  section  the  syntax  and  informal  semantics  of  the  language  of  COIN,  which  is 
inspired  largely  by  Gulog  [15]^''.  Rather  than  making  inferences  using  a  context  logic  (see,  for 


^■"Gulog  differs  from  F-logic  [28]  in  that  method  rules  cire  bound  to  the  underlying  types,  which  leads  to  different 
approciches  for  dealing  with  non-monotomc  inheritance.  Specifically,  in  the  case  of  F-logic,  it  is  not  rales  but 
ground  expressions  that  axe  handed  down  the  generalization  hierarchy.  Since  we  are  interested  in  reasoning  at 
the  intensional  level,  the  former  model  is  more  appropriate  for  us. 
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example,  [6]),  we  introduce  "context"  as  first-class  objects  and  capture  variations  in  different 
contexts  through  the  use  of  parameterized  methods.  For  example,  the  context-formula  is/(ci, 
value{sko,  1  000  000))  can  be  equivalently  written  as  sko[value{c\)^  1  000  000]  where  va]ue(ci) 
represents  a  (single- valued)  method.  This  simplification  is  possible  because  of  our  commitment 
to  a  common  "vocabulary"  (i.e.,  what  types  exists  and  what  methods  are  applicable)  and  the 
fact  that  object  ids  remains  immutable  across  different  contexts.  By  writing  statements  which 
are  fully  decontextualized  (i.e.,  "lifted"  from  the  individual  source  and  receiver  contexts  into 
the  integration  context),  we  are  able  to  leverage  on  semantics  and  proof  procedures  developed 
without  provision  for  contexts. 

Following  [34],  we  define  an  alphabet  as  consisting  of  (1)  a  set  of  type  symbols  which  are 
partitioned  into  symbols  representing  semantic-types  and  primitive-types:  each  of  which  have 
a  distinguished  type  symbol,  denoted  by  T^  and  T-p  respectively;  (2)  an  infinite  set  of  constant 
symbols  which  represents  the  oids  (or  identically,  values)  of  primitive-objects;  (3)  a  set  of 
function  symbols  and  predicate  symbols;  (4)  a  set  of  method  symbols  corresponding  to  attributes, 
modifiers,  and  built-in  methods  (e.g.,  va/(ie  and  cvt);  (5)  an  infinite  set  of  variables;  (6)  the 
usual  logical  connectives  and  quantifiers  A,  V,  V,  3,  ->,  etc;  (7)  auxiliary  symbols  such  as  (,  ),  [,  ], 
:,  ::,  — >,  ^  and  so  forth;  and  finally,  (8)  a  set  of  context  symbols,  of  the  distinguished  object-type 
called  ctx,  denoting  contexts.  A  term  is  either  a  constant,  a  variable,  or  the  token  /(ii, . . . ,  tn) 
where  /  is  a  function  symbol  and  ii , . . .  ,  i,j  are  terms.  Since  terms  in  our  model  refer  to  (logical) 
oids,  they  are  called  oid-terms.  Finally,  a  predicate,  function,  or  method  symbol  is  said  to  be 
n-ary  if  it  expects  n  arguments. 

Definition  3  A  declaration  is  defined  as  follows: 

•  if  T  and  r'  are  type  symbols,  then  r  ::  r'  is  a  type  declaration.  We  say  that  r  is  a  subtype 
of  t',  and  conversely,  that  r'  is  a  supertype  of  t.  For  any  type  symbol  r"  such  that  r'  ::  r", 
T  is  also  a  subtype  of  t". 

•  if  i  is  a  term  and  r  is  a  type  symbol,  then  t  :  t  is  an  object  declaratiori.  We  say  that  /,  is 
an  instance  of  type  r.  If  r'  is  a  supertype  of  t,  then  t  is  said  to  be  of  inherited  type  t' . 

•  if  p  is  an  n-ary  predicate  symbol,  and  ti  , . . . ,  r^  are  type  symbols,  then  p(ti  , . . . ,  t„)  is  a 
predicate  declaration.  We  say  that  the  signature  of  predicate  p  is  ri  x  •  •  •  x  r„. 

•  if  in  is  an  attribute  symbol  and  t,  r'  are  symbols  denoting  semantic-types,  then  T[m^T'] 
is  an  attribute  declaration.  We  say  that  the  signature  of  the  attribute  is  t—>t',  and  that 
the  semantic-type  r  has  attribute  m. 
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•  if  m  is  a  modifier  symbol,  and  r,  r'  are  symbols  denoting  semantic-types,  then  T[m{ctx)=i-T'] 
is  a  modifier  declaration.  We  say  that  m  is  a  modifier  of  the  semantic-type  r,  which  has 
signature  t—^t'.   Without  any  loss  of  generality,  we  assume  that  m  is  unique  across  all 
semantic-types. 

•  if  r  is  a  semantic-type,  and  t\,T2  are  primitive  types,  then  T[cvt(ctx)@ctx,Ti=^T2\  is  a 
compound  conversion  declaration.  We  say  that  the  signature  of  the  compound  conversion 

for  r  is  r  X  n— >r2. 

•  if  T  is  a  semantic- type,  m  is  a  modifier  defined  on  r,  and  ri,  T2  are  primitive  types,  then 
T[cvt(ctx)@'m,ctx,  Ti=>T2\  IS  a  atomic  conversion  declaration.  We  say  that  the  signature 
of  the  atomic  conversion  of  m  for  r,  is  t  x  ti— >r2. 

•  if  r  is  a  semantic-type,  ti  is  a  primitive-type  and  c  is  a  context  symbol,  then  T[value(ctx)—>Ti] 
is  a  value  declaration.  We  say  that  the  signature  of  the  value  for  t  is  given  by  r— >ti. 

Declarations  for  attributes,  modifiers,  conversions,  and  the  built-in  method  value  are  collectively 
referred  to  as  method  declarations.  □ 

Definition  4  An  atom  is  defined  as  follows: 

•  if  p  is  an  n-ary  predicate  symbol  with  signature  ri  x  •  •  •  t„  and  t\, . . .  ,tn  are  of  (inherited) 
type  Ti, . . . ,  T„  respectively,  then  p{ti,  ■  ■  ■  ,tn)  is  a  predicate  atom. 

•  if  m  is  an  attribute  symbol  with  signature  t^t'  and  t,  t'  are  of  (inherited)  types  r,  r' 
respectively,  then  ([m-^i']  is  an  attribute  atom. 

•  if  m  is  a  modifier  symbol  with  signature  t—>t',  c  is  a  context  symbol,  and  t,  t'  are  of 
(inherited)  types  r,  r'  respectively,  then  i[m(c)— >i']  is  a  modifier  atom. 

•  if  the  compound  conversion  function  for  r  has  signature  t  x  ti^T2,  t,ti,t2  are  of  (in- 
herited) types  T,  Ti ,  T2  respectively,  c  is  a  context  symbol,  and  tc  is  a  context  term,  then 
t[cvt{c)@tc,t\^t2]  is  a  compound  conversion  atom. 

•  if  the  atomic  conversion  atom  of  the  modifier  m  has  signature  r  x  tj— >r2,  c  is  a  context 
symbol,  t,  ^1,^2  are  of  (inherited)  types  r,  ti,  T2  respectively,  and  tc  is  a  context  term,  then 
t[cvt{c)@Tn,tc,ti-^t2]  is  a  atomic  conversion  atom  for  m. 

•  if  the  value  signature  is  given  by  t-^t',  c  is  a  context  symbol,  and  t,  t'  are  of  (inherited) 
types  T,  Ti,  then  t[vahie{c)-^t']  is  a  value  atom. 


24 


As  before,  the  atoms  corresponding  to  attributes,  modifiers,  conversions,  and  built-in  method 
value  are  referred  to  collectively  as  method  atoms.  □ 

Atoms  can  be  combined  to  form  molecules  (or  compound  atoms):  these  are  "syntactic  sugar" 
which  are  notationally  convenient,  but  by  themselves  do  not  increase  the  expressive  power  of 
the  language.  For  example,  we  may  write 

•  t[mi—^ti;  ■  ■  ■  ■,mk-^tk]  as  a  shorthand  for  the  conjunct  t[mi-^t\]  A  •  •  •  A  t[mk^tk]; 

•  t[m^ti[mi-^t2]]  as  a  shorthand  for  i[m— >ii]  A  ii[mi->i2];  and 

•  t  :  T[m^t']  as  a  shorthand  for  <  :  t  A  t[77i^t']. 

Well  formed  formulas  can  be  defined  inductively  in  the  same  manner  as  in  first-order  lan- 
guages [34];  specifically, 

•  an  atom  is  a  formula; 

•  [{(f)  and  if  are  formulas,  then  -'(p,  cp  Aip  and  (py  ip  are  all  formulas; 

•  if  (/)  is  a  formula  and  X  is  a  variable  occurring  in  0,  then  both  (VX(/))  and  {3X  (p)  are 
formulas. 

Instead  of  dealing  with  the  complexity  of  full-blown  first-order  logic,  it  is  customary  to  restrict 
well-formed  formulas  to  only  clauses. 

Definition  5  A  Horn  clause  in  the  COIN  language  is  a  statement  of  the  form 

rhA^Bi,...,B„ 

where  A  can  either  be  an  atom  or  a  declaration,  and  B\,. . .  ,Bn  is  a  conjunction  of  atoms.  A 
is  called  the  head,  and  Bi,. . . ,  B^  is  called  the  body  of  the  clause.  If  A  is  a  method  atom  of 
the  form  t[rn@  . . .  ->i']  where  t  is  a  term  denoting  a  semantic-object,  then  the  predeclaration  F 
must  contain  the  object  declarations  for  all  oid-terms  in  the  head.  Otherwise,  F  may  be  omitted 
altogether.  Q 

4.3     The  COIN  Framework 

The  COIN  framework  builds  on  the  COIN  data  model  to  provide  a  formal  charcicterization  of  the 
Context  Interchange  strategy  for  the  integration  of  heterogeneous  data  sources. 

Definition  6  A  coin  framework  J^  is  a  quintuple  <S,^,£,'D,C>  where 
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•  5,  the  source  set,  is  a  labeled  multi-set  {s\  :=  S\,...,Sjn  :—  S„i}.  The  label  Si  is  the 
name  of  a  source,  and  Si  consists  of  ground  predicate  atoms  rij{ai,...)  as  well  as  the 
integrity  constraints  which  are  known  to  hold  on  those  predicates.  The  set  of  atoms  of 
Tij  constitute  a  relation  r^j  in  Sj. 

•  fj,,  the  source-to-context  mapping,  defines  a  (total)  function  from  S  to  C.  If  fj,{si)  =  Cj,  we 
say  that  the  source  Si  is  in  context  Cj. 

•  V,  the  domain  model,  is  a  set  consisting  of  declarations.  Intuitively,  declarations  in  the 
domain  model  identify  the  types,  methods,  and  predicates  which  are  known. 

•  £,  the  elevation  set,  is  a  multi-set  {Ei,. . . ,  E,„}  where  Ei  is  the  set  of  elevation  axioms 
corresponding  to  Sj  in  S.  Ei  consists  of  three  parts: 

-  for  ear'i  relation  r^j  G  Si,  there  is  a  clause  which  defines  a  corresponding  semantic- 
relation  r[-  in  which  every  primitive  object  in  Vij  is  replaced  by  a  Skolem  term  in 
r'  • 

-  for  every  oid-term  in  r[,,  we  identify  its  type  via  the  introduction  of  an  object 
declcu-ation,  and  define  the  values  which  are  assigned  to  structural  properties  (i.e., 
attributes);  and 

-  for  every  oid-term  in  r[,,  we  define  its  value  in  context  c(=  n{st))  with  reference  to 
rij. 

•  C,  the  context  multi-set,  is  a  labeled  multi-set  {c\  :=  C\,...,Cn  '■=  Cn}  where  Cj  is  a 
context  symbol,  and  Ci,  called  the  context  set  for  Cj,  is  set  of  clauses  which  provides  a 
description  of  the  relevant  data  semantics  in  context  Cj.  Qi 

We  provide  the  intuition  for  the  above  definition  by  demonstrating  how  the  integration  scenario 
shown  in  Figure  1  can  be  represented  in  a  coin  framework  T  =  <S,  /i,  £,  V,  C>.  Figures  3  and  4 
present  a  partial  codification  which  we  will  elaborate  briefly  below: 

•  The  contents  of  the  source  set  S  is  simply  the  set  of  ground  atoms  present  in  the  data 
sources.  We  place  no  limitation  on  the  number  of  relations  which  may  be  present  in 
each  source;  in  the  current  example,  it  happens  that  each  source  has  only  one  relation. 
The  rules  following  the  ground  atoms  are  functional  dependencies  which  are  known  to  be 
true  in  the  respective  relation.  For  instance,  the  two  rules  in  S\  defines  the  functional 
dependency  cname  — >  {re venue, currency}  on  the  attributes  in  r\. 

•  The  function  /i  is  defined  as  a  relation  on  S  x  C:  thus,  source  si  is  mapped  to  context  ci, 
whereas  S2  and  S3  are  both  mapped  to  context  C2- 
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Source  set  5 


si  :=  {  riClBM',  1000000,  'USD')-  ri('NTT',  1000000,  'JPY'). 
fii  =^2  ^ri(/V,fli,.),n(7V,fl2,-). 
Yi=Y2i-ri{N,.,y\),r,(N,.,Y2).      } 

52  :=  {  r2('IBM',  1500  000).  raCNTT',  5000  000). 

Ei=E2^ro{N,Ei),r2(N,E-2).       } 

53  :=  {  rsCUSD',  'JPY',  104.0).  rsCJPY' ,  'USD',  0.0096). 

Ti  =  T2^r3{X,Y,Ti),r3{X,Y,T2).      } 

Source-to-Context  Mapping  fj, 

{/x(Si ,  Ci),  fl{s2,C2),  ^l{S3,C2)} 

Domain  model  V 


/*  type  declarations  */ 

semanticNumber  ::  T5. 

semajiticString  ::  T5. 

money Amt  ::  semanticNumber. 

companyFinanciaJs  ::  moneyAmt. 

currencyType  ::  semanticString. 

company  Name  ::  semanticString. 

/*  attribute  declaration  */ 

companyFinancials[company  =>  company  Name] . 

/*  modifier  declarations  */ 

money Amt[currencyfctxj=>  currencyType;  scaleFactor(ctx)-- 

/*  value  declarations  */ 
semanticString\va\ue{ctx)^  varciiar]. 
semanticJVumber[va/uefctxj=>  number]. 

/*  conversion  declarations  */ 
semanticString[cvt(ctx)@ctx,varcbar  =>■  varchar]. 
semanticNumber[cvt(ctx)(§ctx,number  =>  number]. 
moneyAmt[cvt('ctx)@ctx,number  =>  number]. 
money Amt[cvt(ctx)@ctx,scaleFactor,number  =>  number]. 
moneyAmt[cvt('ctxj@ctx,currency,number  =>  number]. 

/*  predicate  declarations  */ 

r[  (company  iVame,companyFinanciaJs, currency  Type) . 

r2(companyName,companyFinanciaJs). 

r!^{currencyType,currencyType,semanticNumber). 


number  ::  Tp. 
varchar  ::  Tp. 
integer  ::  number, 
real  ::  number. 


semanticNumber] . 


ri  (v'archar,integer,  varchar), 

r2  ( varchar,integer ) . 

rs  ( varchar,  varchar,real). 


Figure  3:  The  source  set,  source-to-coiitext  mapping,  and  domain  model  for  the  COIN  framework 
corresponding  to  the  motivational  example. 
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•  The  domain  model  V  consists  of  two  parts.  The  left-half  (as  seen  in  Figure  3)  identifies 
(1)  the  semantic-types  which  are  known  and  the  generalization  hierarchy;  (2)  the  declara- 
tions for  methods  which  are  applicable  to  the  semantic-types;  and  the  signatures  for  the 
predicates  corresponding  to  the  semantic  relations  (r,).  The  right-half  does  the  same  for 
primitive-types  and  predicates  for  the  exteiisional  relations. 

•  The  first  clause  in  each  £?,  of  the  elevation  set  £  defines  the  semantic  relation  r[  corre- 
sponding to  the  relation  rf,  the  semantic  relations  are  defined  on  semantic-objects  {as 
opposed  to  primitive-objects),  which  axe  instantiated  as  Skolem  terms.  The  Skolem  func- 
tion (e.g.,  /r2#expenses)  ^r^  chosen  in  the  way  such  that  when  applied  to  the  key-value 
of  a  tuple  in  the  corresponding  relation  (e.g.,  'NTT'),  the  resulting  Skolem  term  (i.e., 
/r2#expenses('NTT'))  would  in  fact  identify  a  unique  "cell"  in  the  relation  as  shown  in 
Figure  2  (in  this  case,  the  expenses  of  NTT  as  reported  in  relation  r2). 

•  Object  declarations  and  attribute  atoms  in  the  elevation  set  provide  a  way  of  specifying 
the  types  of  corresponding  Skolem  terms  introduced  in  the  semantic  relation.  For  in- 
stance, any  Skolem  term  /ri#revenue(-)  is  asserted  to  be  an  instance  of  the  semantic- type 
companyFinancials.  The  attribute  atom  following  this  declaration  defines  the  object  that 
is  assigned  to  the  company  attribute  for  this  semantic-object 

•  The  values  of  the  Skolem  terms  introduced  in  the  semantic  relation  are  defined  through 
the  clauses  shown  last.  The  primitive-objects  assigned  are  obtained  directly  from  the 
extensional  relation.  Clearly,  the  value  assignment  is  valid  only  within  the  context  of  the 
source  as  identified  by  /i;  the  values  of  the  Skolem  terms  in  a  different  context  can  be 
derived  through  the  use  of  conversion  functions,  which  we  will  define  later. 

The  context  multi-set  C  is  given  by  {ci  :=  Ci,C2  :=  C2}  and  is  defined  by  the  axioms 
shown  in  Figure  5.  There  are  two  kinds  of  axioms:  modifier  value  definitions  and  conversion 
definitions. 

Consistent  with  our  data  model,  modifiers  can  be  assigned  diflferent  values  in  distinct  con- 
texts: this  constitutes  the  principle  mechanism  for  describing  the  meaning  of  data  in  disparate 
contexts.  For  example,  the  fact  that  in  context  Ci,  companyFinancials  are  reported  using  a 
scale-factor  of  1 000  whenever  it  is  reported  in  JPY,  and  1  otherwise,  can  be  represented  by  the 
formula: 

VX'  :  companyFinancials  3F'  :  number  h 
( X'[sca/eFactor(ci  )^F']  )A 
(F'[vaiue(ci)->  1000]  <-  X'[curreJicy(ci)^y']  A  Y'  ='3PY>  )  A 
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Elevation  Axioms  Ei  of  ^ 


'"l(/rl#ciiame(-'^l),/rl#revenue(-^l)./rl#currency(-^'l))  <~  ^l('^l)-. -)• 

/ri#cnan.e(-)  :  compajiyNanie. 
/ri#revenue(-)  :  companyFinancials. 

/rl# revenue (^l) [company  ->  /rl#cname(A'i  )]. 

/ri#currency{-)  :  currencyType. 

/rl#cname(A'i)[vaJue(C)^,Yi]  f-  Ti  (.Yi ,.,.), /x(Si ,  C). 
/rl#revenue(-Vi)[vaJue(C)^A'o]  <-  Tj  (Xi , -Va, -),  m(S1  ,  C)- 
/rl#currency(A'l)[va/ue(C)->A'3]  <-  H  (Xi ,  _,  X3),  ^(Si ,  C). 

Elevation  Axioms  Eo  of  £^ 


'"2(/r2#cname(A'l),/r2#expenses(-'^l))  <"  T^iXi,.). 

/r2#cname(-)  :  conjpauyiVanje. 
/r2#expeiises(-)  :  companyFinancials. 

/r2#expenses(Ai)[cOnipany  ->  /r2#ciiame(-'^l  )]• 
/r2#cname(A',)[vaiue(C)->Xi]  <r-  r-iC-Yi ,  _), /l(so,  C). 
/r2#expenses(A'i)[vaiue(C)-^A'2]  <-  r^iXi,  X2),  n{S2,C). 

El evation  Axioms  £3  of  ^ 


'■3(/r3#fromCur(-^l ,  A'a),  /r3#toCur(-^I  ,  A'2),  /r3#exchangeRate(A"i ,  Xi))  <-  r3(Xi ,  X2,-). 

/r3#fromCur(-,-)  :  Currency  Type. 
/r3#toCur(-,-)  :  curreiicyType. 
/r3#exchai.geRate(-,  -)  :  semanticNumber. 

/r3#fron,Cur(Ai,X2)Mue(C)-^A,]  ^  r3(Xi ,  X2,  .),  M«3 ,  C) . 
/r3#toCur(A,,X,)[vaJue(C)^A2]  ■e  r3(Xi,X2,-),/i(s3,C). 
/r3#exchai,geRate(Ai ,  A2)[vailie(C)^  A'3]  <r-  r3(Xi ,  X2,  X3) ,  ^l{s3,C). 


Figure  4:  Elevation  set  corresponding  to  the  motivational  example 
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Context  Ci: 


/*  modifier  value  assignments  */ 

X'  :  companyFinajicials  \-  X'[scaJeFactor(ci)->-  scaJeFactor{ci,X')]. 

X'  :  companyFinanciaJs,  scaleFactor{ci,X')  :  number  h 

sca/eFactor(ci,X')[vaJue{ci)^  l]  <-  X'[currency(ci)^Y'],Y'  ^'JPY'. 
X'  :  companyFinanciaJs,  scaJeFactor(c\,X')  :  number  h 

scaleFactorici, X')[value{ci)^  1000]  <-  X'[currency{ci)^Y'],Y'  ='JPY'. 
X'  :  companyFinanciaJs  h  X' [currency (ci)^  currency {ci,X')]. 
X'  :  companyFinanciaJs,  currency {ci,X')  :  currencyType  h 

currency(cuX')[value{ci)^Y]  ^  X'[company  -^NI,],r\(N[,R',Y'),NI,  i  N[, 

Y'[value{ci)^Y]. 

/*  conversion  function  definitions  */ 
X'  :  moneyAmt  \- 

X'[cvt{ci)@C,U^V]  <-  X'[cvt{ci)mscaJeFactor,C,U^W], 

X'[cvt{ci)@currency,C,  W-¥V]. 
X'  :  moneyAmt  \- 

X'[cvt{ci)@scaJeFactor,C,  U^V]  <-  A"[scaJeFactor(ci)->-[va/ije(ci)->Fi]], 

X'[scaleFactor{C)^.[value{ci)^F]],  V  =  U*  F/F^ 
X'  :  moneyAmt  h 

.Y'[cvt(ci) ©currency,  C,  U-^V]  <-  X'[currency{ci)-^Y{],  X'[currency{C)-^Y'], 

X'  :  moneyAmt  I- 

X'[cvt{ci)®currency,C,U^V]  <-  X'[currency{ci)^Y(],X'[currency{C)^Y'], 

y;  %  y,  r'.iY}, f/, R), y}  i  r/, y;  i  Y', 

R'[value{ci)^R],V  =  U  *  R. 


Context  C2: 


/*  modifier  value  assignments  */ 

X'  :  companyFinancials  h  X'[scaJeFactor(c2)->-  scaJeFactor(co,X')]. 

X'  :  moneyAmt,  sca}eFactor{c2,X')  :  number  h  scayeFactor(c2,  A")[vaiue(c2)^  l). 

X'  :  companyFinanciaJs  h  X'[currency(c2)^  currency (c2,-X'')]. 

X'  :  moneyAmt,  currency {c2,X')  :  currencyType  h 

currency(c2,A")[vaJue(c2)->  'USD']. 
/*  conversion  definitions  are  similar  to  ci  and  omitted  for  brevity  */ 


Figure  5:  Context  sets  for  C  for  the  motivational  example  at  hand. 
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^    (F'[va7ue(ci)^  l]  ^  X'[ciirrency(ci)^Y']AY'  ^'JPY'  ). 

The  above  formula  is  not  in  clausal  form,  but  can  be  transformed  to  definite  Horn  clauses  by 
Skolemizing  the  existentially  quantified  variable  F'.  For  example,  the  above  formulas  can  be 
reduced  to  the  following  clauses: 

X'  :  company  Financials  h 

X'[scaleFactor{ci  )^fscaleFactor{ci}(^')]- 

X'  :  company  Financials,  JscaieFactor(c^){^')  ■  number  h 

Aca/eFacMc,)(^')[va/»e(ci)^  1  OOO]  ^  X'[currency(ci)->r'],y'^'JPY'. 
A''  :  companyFinancials,  fscaleFactor{ci)(^')  '■  number  h 

fscaieFactoH.cM^')[y^iiie{cj)^  l]  ^  X ' [currency (c,)^Y'],Y'  ^MPY'. 

where  fscaieFactorici)  's  a  unique  Skolem  function;  for  notational  simplicity,  we  replace 
fscaieFactor{c-2)(-^')  with  the  term  scaleFactor(c2,  X').  Currency  values  corresponding  to  in- 
stances of  companyFinancials  are  obtained  directly  from  the  extensional  relation  r\  as  shown 
in  Figure  5.  In  this  instance,  it  is  necessary  to  reference  an  extensional  relation  because  "meta- 
data" are  represented  along  with  "data"  in  a  source.  In  a  "better-behaved"  situation  (such 
as  context  C2),  the  modifier  values  for  ciu-rency  and  scaleFactor  can  be  defined  independently 
of  the  underlying  schema.  It  is  worthwhile  to  note  that  our  framework  is  sufliciently  expres- 
sive to  capture  both  types  of  scenario,  although  the  first  tends  to  make  the  boundary  between 
intensional  and  extensional  knowledge  more  fuzzy. 

Conversion  functions  define  how  the  value  of  a  given  semantic-object  can  be  derived  in 
the  current  context,  given  that  its  value  is  known  with  respect  to  a  different  context.  As 
shown  in  Figure  5,  the  first  clause  in  the  gi'oup  (for  context  ci)  defines  the  conversion  for 
moneyAmt  via  the  composition  of  atomic  conversion  functions  for  scaleFactor  and  currency. 
The  scaleFactor  conversion  is  defined  by  identifying  the  respective  scale-factors  in  the  source 
and  target  contexts  and  multiplying  the  value  of  the  moneyAmt  object  by  the  ratio  of  the 
two.  The  currency  conversion  is  obtained  by  multiplying  the  source  value  by  a  conversion  rate 
which  is  obtained  via  a  lookup  on  yet  another  data  source  (r:^).  Notice  that  these  conversions 
are  defined  with  respect  to  moneyAmt  but  are  applicable  to  companyFinancials  via  behavioral 
inheritance  of  the  methods.  In  general,  the  repertoire  of  conversion  functions  can  be  extended 
arbitrarily  by  defining  the  conversion  externally  and  invoking  the  external  functions  using  the 
built-in  system  predicate  which  serves  as  an  escape  hatch  to  the  operating  system.  However, 
encapsulating  the  conversion  in  external  functions  makes  it  harder  to  reason  about  the  properties 
of  the  conversion;  for  examj^le,  the  explicit  treatment  of  arithmetic  operators  and  table-lookups 
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(ill  conversion  functions)  allow  us  to  exploit  opportunities  for  optimization,  say,  by  rewriting 
the  arithmetic  expression  to  reduce  the  size  of  intermediary  tables  during  query  execution^^. 

5     Query  Answering  as  Abductive  Inferences 

Following  the  same  algorithm  outlined  in  [1],  any  collection  of  COIN  clauses  can  be  translated  to 
Datalog  with  negation  (Datalog"'^)  (or  equivalently,  normal  Horn  program  [34]),  for  which  the 
semantics  as  well  as  computation  procedures  have  been  widely  studied  [51]^^.  In  this  section, 
we  explore  an  alternative  approach  based  on  abductive  reasoning.  The  abductive  framework 
provides  us  with  intensional  (as  opposed  to  extensional)  answers  to  a  query^'^.  We  describe  this 
abductive  framework  below  and  the  relationship  between  query  mediation  in  a  coin  framework 
and  query  answering  in  an  abductive  framework.  In  the  interest  of  space,  we  assume  some 
familiarity  with  logic  programming  at  the  level  of  [o4]  in  the  ensuing  discussion,  and  for  most 
part,  shall  remain  faithful  to  the  notations  therein. 

5.1      The  Abductive  Framework 

Abduction  refers  to  a  particular  kind  of  hypothetical  reasoning  which,  in  the  simplest  case,  takes 
the  form: 

From  observing  A  and  the  axiom  B  -^  A 
Infer  5  as  a  possible  "explanation"  of  A. 

Abductive  logic  programming  (ALP)  [27]  is  an  extension  of  logic  programming  [34]  to  support 
abductive  reasoning.  Specifically,  an  abductive  framework  [17]  is  a  triple  <T,A,I>  where 
T  is  a  theory,  I  is  a  set  of  integrity  constraints,  and  ^  is  a  set  of  predicate  symbols,  called 
abducible  predicates.  Given  an  abductive  framework  <T,A,I>  and  a  sentence  3Xq{X)  (the 
observation) ,  the  abductive  task  can  be  characterized  as  the  problem  of  finding  a  substitution  6 
and  a  set  of  abducibles  A,  called  the  abductive  explanation  for  the  given  observation,  such  that 

(1)  TuA\=q{X)9, 

(2)  T  U  A  satisfies  I;  and 


^^ Details  of  query  optimization  strategies  that  taJce  into  eiccount  conversion  functions  are  beyond  the  scope  of 
the  work  reported  here.  A  more  detailed  discussion  can  be  found  in  [13]. 

'^The  fact  that  "object-based  logics"  can  be  encoded  in  classicfil  predicate  logic  has  been  known  for  a  long 
time  (see  for  example,  [9]).  This  however  should  not  cause  us  to  "lose  faith"  in  our  data  model,  since  the  syntax 
of  the  language  plays  a  pivotaJ  role  in  shaping  our  conceptualization  of  the  problem  amd  in  finding  solutions  at 
the  appropriate  levels  of  abstraction. 

'^This  change  in  perspective  is  beneficial  for  a  variety  of  reasons  (see  Section  2),  and  will  not  be  repeated  here. 
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(3)  A  has  some  properties  that  make  it  "interesting". 

Requirement  (1 )  states  that  A,  together  with  T,  must  be  capable  of  providing  an  explanation  for 
the  observation  q{X)9.  The  consistency  requirement  in  (2)  distinguishes  abductive  explanations 
from  inductive  generalizations.  Finally,  in  the  characterization  of  A  in  (3),  "interesting"  means 
primarily  that  literals  in  A  are  atoms  formed  from  abducible  predicates:  where  there  is  no 
ambiguity,  we  refer  to  these  atoms  also  as  abducibles.  In  most  instances,  we  would  like  A  to 
also  be  minimal  or  non-redundant. 

Semantics  and  proof  procedures  for  ALP  have  been  active  research  topics  recently  (see  [27] 
and  references  therein).  We  describe  in  this  section  an  abduction  procedure  based  on  extensions 
to  SLD  resolution,  called  SLD+ Abduction.  The  underlying  idea  is  first  reported  in  [12],  and 
has  inspired  various  different  extensions.  The  account  we  give  here  follows  that  in  [46]. 

We  first  consider  SLD  resolution.  Given  a  theory  T  consisting  of  (definite)  Horn  clauses 
and  a  goal  clause  <—  q{X),  and  SLD-refutation  of  <—  q{X)  is  a  sequence  of  goal  clauses  <—  Go(= 
q(X));  <—  Gi;-;<—  Gn  where  <—  G„  is  the  empty  clause  (□)  and  each  <—  Gi+i  is  obtained 
from  <—  Gt  by  resolving  one  of  its  literals  (the  selected  literal)  with  one  of  the  clauses  in  T. 
Since  there  may  be  many  clauses  in  T  which  can  be  resolved  with  the  selected  literal,  a  space 
of  possible  refutations  is  defined  (in  the  form  of  an  SLD-tree).  The  search  space  defined  by  an 
SLD-tree  may  be  searched  in  a  number  of  ways  (e.g.,  in  a  depth-first  manner). 

Suppose  now  that  there  is  some  <-  G^,  whose  selected  literal  g  will  not  resolve  with  any 
clause  in  T.  This  means  that  the  part  of  the  subtree  with  <—  G,  at  the  root  is  not  worth 
exploring  any  further,  since  it  will  not  contain  any  branch  that  leads  to  a  refutation  (i.e.,  one 
which  terminates  in  an  empty  clause).  Given  however  that  we  are  searching  for  a  set  of  unit 
clauses  A,  such  that  TU  A  |=  G,  then  clearly  by  letting  A  include  a  unit  clause  which  resolves 
with  g,  we  can  continue  the  search  with  <—  Gj+i,  which  is  obtained  from  <—  Gj  minus  the  literal 
g.  This  observation  forms  the  basis  for  the  SLD+Abduction  procedure  which  we  proceed  to 
describe  below. 

Given  an  abductive  framework  <T,A,1>  and  the  abductive  query  q{X),  consider  the 
sequence  given  by 

<—  Go,  Ao  where  Go  =  q{X)  and  Ao  is  the  empty  set 

<-Gn,A„ 

such  that  Gt+i,  At_|_i  is  derived  from  Gj,  A,  as  follows: 

•  if  g,  the  selected  literal  of  <—  Gj,  can  be  resolved  with  a  clause  in  T,  then  a  single  resolution 
step  is  taken  to  yield  Gj+i,  and  Aj^-i  =  A^; 
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•  if  g  is  abducible,  g'  is  g  with  all  its  variables  replaced  by  Skolem  constants,  and  TU  A,U 
{5'  <— }  is  consistent  with  I,  then  Gj-i-i  is  G,  less  p,  and  Aj+i  =  Aj  U  {g'  <— }. 

The  sequence  obtained  is  said  to  be  a  derivation  of  G  with  respect  to  the  abductive  framework 
<T,A,I>.  A  derivation,  as  we  have  just  defined,  is  said  to  be  a  refutation  if  <—  Gn  is  the 
empty  clause.  The  accumulated  set  of  unit  clauses  A„  is  said  to  be  the  residue  corresponding  to 
this  refutation,  and  constitutes  the  abductive  answer  to  V  ( q{X)6 ),  where  6  is  the  substitution 
obtained  from  the  composition  of  all  substitutions  leading  to  the  refutation,  restricted  to  the 
variables  X. 

In  the  abduction  step  above,  we  require  that  the  selected  literal  g  to  be  Skolemized.  This 
is  because  variables  in  the  unit  clause  "3'  <— "  needs  only  be  existentially  quantified  for  it  to  be 
resolvable  with  g.  If  the  Skolemization  is  not  done,  the  abducted  fact  "g'  ^"  (where  g'  =  g) 
would  have  been  unnecessarily  strong.  This  Skolemization,  however,  introduces  additional 
complexity  since  it  becomes  necessary  to  deal  with  equality  constraints  on  Skolem  constants. 
This  is  due  to  the  fact  that  a  Skolem  constant  {sk)  introduced  earlier  in  the  SLD-derivation 
(say  in  <—  Gi)  may  have  to  be  unified  with  a  specific  term  [t]  later  on  (in  <—  Gj,  where  j  >  i)- 
In  [16],  it  is  suggested  that  this  can  be  dealt  with  by  introducing  the  equality  predicate  as  an 
abducible  predicate  and  to  add  the  theory  of  Free  Equality  (FEQ)  [10]  as  integrity  constraints. 
Thus,  when  a  Skolem  constant  sk  is  to  be  unified  with  a  term  i,  the  equality  fact  sk  —  Ik 
abduced  explicitly  and  the  consistency  of  sk  =  t  with  other  abduced  facts  and  FEQ  is  checked. 

The  procedure  which  we  have  just  described  can  be  extended  to  cope  with  negation  through 
the  use  of  negation-as-failure  [17].  Suppose  that  the  selected  literal  of  the  current  goal  clause 
is  not  g.  The  usual  negation-as-failure  mechanism  is  used:  i.e.,  if  3  cannot  be  proven  from  the 
theory  (augmented  with  the  current  residue),  then  not  g  is  assumed  to  be  true.  There  are  two 
sources  of  complications  in  this  scheme.  First,  it  may  happen  that  g  becomes  provable  later  in 
the  refutation  when  additional  facts  are  abducted.  To  avoid  this,  not  g  needs  to  be  recorded 
so  that  new  clauses  which  are  subsequently  added  do  not  violate  this  implicit  assumption. 
Second,  negation  may  be  nested.  Suppose  there  is  a  clause  given  by  g  <—  not  h,  and  that  h  is 
not  provable  from  the  current  residue.  Then  an  attempt  to  prove  not  g  using  SLD-resolution 
with  negation-as-failure  (SLDNF)  will  fail  because  it  is  not  possible  to  prove  h.  However,  h 
might  be  rendered  provable  by  adding  further  clauses  to  the  residue.  So  rather  than  using 
SLD-resolution  to  try  to  show  /i,  abduction  is  used  instead  and  is  allowed  to  add  to  the  residue. 
This  procedure  can  be  generalized  to  any  level  of  nesting,  with  SLD  being  used  at  even  levels, 
and  abduction  at  odd  levels. 
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5.2      Query  Answering  in  the  COIN  Framework 

Figure  6  illustrates  how  queries  are  evaluated  in  a  Context  Interchange  system.  From  a  user 
perspective,  queries  and  answers  are  couched  in  the  relational  data  model:  a  (data-level  or 
knowledge-level)  query  is  formulated  using  a  relational  query  language  (SQL  or  some  extension 
thereof),  and  answers  can  either  be  intensional  (a  mediated  query)  or  extensional  (actual  tuples 
satisfying  the  query).  Examples  of  these  queries  and  answers  have  been  presented  earlier  in 
Section  2.1. 
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Figure  6:  A  summary  of  how  queries  are  processed  within  the  Context  Interchange  strategy: 
(D  transforms  a  (extended)  SQL  query  to  a  well-formed  COIN  query;  @  performs  the  COIN  to 
Datalog"'^6  translation;  @  is  the  abduction  computation  which  generates  an  abductive  answer 
corresponding  to  the  given  query;  and  ®  transforms  the  answer  from  clausal  form  back  to  SQL. 
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Transformation  to  the  COIN  Framework 

Within  the  COIN  framework,  the  SQL-like  queries  originating  from  users  are  translated  to  a 
clausal  representation  in  the  coin  language.  For  example,  queries  Ql  and  Q2  in  Section  2  can 
be  mapped  to  the  following  clausal  representations: 

CQl:   ■(-  answer {N,R). 

answer{N,  R)  <-  ri(N,  R,  .),r2{N,  E),R>  E 

and  correspondingly, 

CQ2:  <^  answer {N,FuF2) 

answer{N,  Fi,  F2)  <r-  ri  (N,  R,  _),  R[scaleFactor{ci)^Fi],  R[scaleFactor{c2)^F2], 
Fi  /  F2. 

The  above  queries  however  do  not  capture  the  real  intent  of  the  user.  For  example,  there  is  no 
recognition  that  "revenue"  and  "expenses"  have  different  currencies  and  scale-factors  associated 
with  them  and  should  not  be  compared  "as  is",  that  R  in  CQ2  is  a  primitive-object  for  which 
the  method  scaleFactor  is  not  defined,  or  the  fact  that  both  queries  originate  from  context 
C2  which  may  be  interpreted  differently  in  a  different  context.  We  say  that  these  queries  are 
"naive",  and  thus  must  be  translated  to  corresponding  "well-formed"  queries. 

Definition  7  Let  <Q,  c>  be  a  naive  query  in  a  COIN  framework  .F,  where  c  denotes  the  context 
from  which  the  query  originates.  The  well-formed  query  Q'  corresponding  to  <Q,  c>  is  obtained 
by  the  following  transformations: 

•  replace  all  relational  operators  with  their  "semantic"  counterpart;  for  example,  X  >  y  is 

c 

replaced  with  X  >  Y. 

•  make  all  relational  "joins"  explicit  by  replacing  shared  variables  with  explicit  equality 
using  the  semantic-operator  =;  for  example,  ri{X,Y),r2{X,Z)  would  be  replaced  with 
n{XuY),r2{X2,Z),X,^X2. 

•  similarly,  make  relational  "selections"  explicit;  thus,  ri  {X,  a)  will  be  replaced  by  ri{X,  Y), 

c 

Y  =  a. 

•  replace  all  references  to  extensional  relations  with  the  corresponding  semantic-relations; 
for  example,  ri{X,Y)  will  be  replaced  with  r[{X,Y). 


• 


append  to  the  query  constructed  so  far,  value  atoms  that  return  the  value  of  the  data 
elements  that  are  requested  in  the  query.  Q 
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Based  on  the  above  transformation,   the  well-formed  query  corresponding  to  naive  queries 
<CQ1,C2>  and  <CQ2,C2>,  are  given  by 


CQl':   <r-  answer(N,R). 

N[[value(c2)^N],R'[value{c2)^R] 


answer{N,  R)  ^  r[{N[,  R',  _),  r'^iN^,  E'),  N[  1  N^,  R'  >  E\ 


and 

CQ2':  <r-  answer {N,FuF2). 

answer{N,  Fi,  F2)  <-  r[(N',  R',J),  R'[scaleFactor{ci)^F[],  R'[sca}eFactor{c2)^F^], 

F{  %  Fi,N'[value(c2)->NlF[[value{c2)^FilF^[vaIue{c2)^F2]. 
respectively. 

Transformations  to  cin  Abductive  Framework 

The  relationship  between  a  COIN  framework  and  an  abduction  framework  can  now  be  stated. 

Definition  8  Given  the  COIN  framework  Tc  =  <S,  fi,£,'DX>,  this  can  be  mapped  to  a 
corresponding  abductive  framework  T^  given  by  <T,  X,  A>  where 

•  T  is  the  Datalog"^^  translation  of  the  set  of  clauses  given  hy  £  WD  UC  U  fi; 

•  I  consists  of  the  integrity  constraints  defined  in  S,  augmented  with  Clark's  Free  Equality 
Axioms  [10];  and 

•  A  consists  of  the  extensional  predicates  defined  in  5,  the  built-in  predicates  corresponding 
to  arithmetic  and  relational  (comparison)  operators,  and  the  system  predicate  which 
provides  the  interface  for  system  calls.  □ 

Suppose  <—  q(X)  is  a  well-formed  query  in  the  COIN  framework  .Fc,  the  corresponding  abductive 
framework  of  which  is  denoted  by  Ta  —  <T,T,A>.  Without  any  loss  of  generality,  we  assume 
that  <—  q(X)  is  identical  in  both  J^c  and  J^a-  This  is  because  Datalog"*'^  is  a  sublanguage  of 
COIN,  and  any  COIN  query  <—  Q{X)  can  always  be  transformed  to  a  Datalog"^^  query  <—  q{X) 
by  adding  the  Datalog^'^^-translation  of  the  clause  q(X)  •(—  Q{X)  into  the  theory  T. 

Given  an  abductive  framework  <T,X,>1>,  and  the  query  3Xq(X).  Suppose  A  =  {pi,...,Pm} 
is  an  abductive  answer  for  q(X)9,  then  it  follows  that 

T\={q{X)e<-pu...,Pm) 
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This  result  follows  from  the  fact  that  pj's  are  ground  for  i  =  1, . . . ,  m,  so  a  set  of  ground  atoms 
in  fact  represents  their  conjunction.  The  conjunct  pi  A  •  •  •  Apm  constitutes  a  precondition  for 
q[X)6.  Suppose  K  =  {sko,...}  is  the  set  of  Skolem  constants  introduced  by  the  abduction 
step,  and  <p  is  a  "reverse"  substitution  {ski/Yi}  where  sfc,  €  K  and  Vj  is  a  distinct  variable 
not  in  X.  Then,  we  say  that  the  tuple  (  3Y  (pi, . . .  ,Pm)</',  ^V')  is  an  intensional  answer  for  the 
query  3X  q{X){6ip).  This  fact  is  not  surprising  given  that  Motro  and  Yuan  [39]  suggested  that 
intensional  answers  can  be  obtained  from  the  "dead-ends"  of  "derivation  trees"  corresponding 
to  a  query.  Although  it  was  not  recognized  as  such,  the  procedure  described  in  [39]  is  in 
fact  a  naive  implementation  of  SLD+ Abduction  (without  any  consistency  checking).  From  the 
perspective  of  the  user  issuing  a  naive  query,  the  intensional  answer  can  also  be  interpreted  as 
the  corresponding  mediated  answer: 

An  an  illustration  of  the  preceding  comments,  the  evaluation  of  CQ2'  in  the  abductive 
framework  yields  the  following  abductive  answer: 

A  =  {ri{sko,skusk2),sk2  ='3PY'},e  =  {7V/sfco,  Fi/1  000,  F2/I} 

The  reverse  substitution  (p  is  given  by  {sfco/Vo,  s/^i/Vi,  5^2/^2},  and  thus  the  intensional  answer 
(equivalently,  the  mediated  query)  is: 

{3Yo,YuY2iri{Yo,YuY2),Y2  ='JPY'),  {TV/Fo,  Fi/l000,F2/l}  ) 

which  translates  to  MQ2  shown  in  Section  2.  If  {Vo/'NTT',  Vi /I  000  000,  Ya/'iPY'}  is  an 
answer  for  the  above  mediated  query,  then  the  answer  for  the  original  user  query  is  given  by 
{A^/'NTT',Fi/l000,F2/l}. 

5.3     Illustrative  Example 

In  this  section,  we  provide  an  example  illustrative  of  the  computation  involved  in  query  medi- 
ation (equivalently,  obtaining  the  intensional  answer  to  a  query). 

Consider  the  query  Q3  (a  simplified  variant  of  Q2)  which  is  issued  from  context  ci ,  which 
queries  relation  ri  for  the  scale-factors  of  revenues  in  context  c\ : 

Q3:       SELECT  rl.cname,   rl .revenue. scaleFactor  IN  cl 
FROM  rl; 

The  (well-formed)  clausal  representation  for  this  query  is  given  by 

CQ3:     <-  aiiswer(Ar,  F). 

answer(iV,  F)  <-  rV{N',  /?',.),  N' [value (ci)^N],  R' [scaleFactor {ci)-^F'], 
F'[value{ci)^F]. 
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Figure  7  shows  one  possible  refutation  of  this  query  using  the  SLD+Abduction  algorithm  de- 
scribed earlier.  For  better  clarity,  the  refutation  is  shown  using  COIN  clauses  rather  than 
Catalog.  The  clauses  used  for  resolving  the  goal  clauses  are  those  shown  earlier  in  Figure  3,  4 

and  5. 

To  aid  in  appreciating  the  chain  of  reasoning,  we  offer  the  following  highlights  on  the  refu- 
tation: 

•  The  refutation  begins  with  the  query  as  given,  with  A  initialized  to  the  empty  set. 

•  At  step  (3),  the  literal  ri{N,.,.)  cannot  be  further  resolved.  Since  ri  is  an  extensional 
predicate  (and  hence  abducible),  it  is  removed  from  the  goal  clause  and  its  Skolemized 
form,  r\(sko,ski,sk2),  is  added  to  A. 

•  At  step  (6),  the  literal  scaleFactor(ci ,  /ri^revenue(sfco))[vaiue(ci)— >F]  can  be  resolved  with 
two  different  clauses  (where  F  =1  and  F  =1000).  One  is  chosen  arbitrarily  (in  this  case, 
F  =1);  the  other  will  be  selected  on  backtracking  and  will  eventually  lead  to  another 
refutation. 

•  To  arrive  at  a  successful  refutation,  the  currency  for  the  revenue-object  at  hand  must 
not  be  '  JPY'  when  evaluated  in  context  c\  (see  step  (8)).  To  determine  if  this  is  the 
case,  it  is  necessary  to  identify  the  currency  value  from  the  extensional  relation  r\  (see 
corresponding  axiom  for  assigning  currency  values  in  Figure  5).  This  eventually  leads  to 
the  expansion  of  the  goal  clause  as  shown  in  step  (10). 

•  In  step  (12),  the  extensional  relation  is  referenced  again.  In  the  absence  of  other  informa- 
tion, we  are  not  allowed  to  assume  that  it  is  the  same  "fact"  which  hcis  been  abducted: 
i.e.,  we  will  need  to  add  a  new  Skolemized  fact,  r\(sk3,sk4,sk5)  to  A. 

•  In  step  (15),  the  equality  constraint  on  the  objects  /ri#cname(sfco)  and  /ri#cname(sfc3)  leads 
to  the  constraint  skg  =  sk^.  Since  '='  is  abducible  (it  is  an  evaluable  predicate),  it  is 
added  to  A.  At  this  point,  the  functional  dependency  cname  -^  {revenue,  currency} 
generates  further  the  constraints  ski  =  -"^k^  and  sk2  =  sk^,  which  in  turn  allow  us  to 
merge  the  two  facts  ri(sA;o,sA;i,.sA;2)  and  ri (5^3, 5^4,5^5). 

•  Finally,  in  step  (17),  the  literal  sk2  =  'JPY'  is  abducted,  which  leads  to  a  refutation.  The 
abductive  answer  corresponding  to  this  refutation  is  given  by  A  =  {r\  (sko,  sk\ ,  .sA;2),  sk2  — 
'JPY'}.  The  substitution,  restricted  to  variables  {N,F},  is  given  by  {N/skQ,  F/l}. 

This  intensional  answer,  translated  to  SQL,  is  given  by: 
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(1)    ^as{N,F).              Ao  =  {} 
1 

(2)    ^r[iN',R',.),N'[v(c,)-^N],R'[sf{ci)-^F'],F'[v{c,)^F]. 

1            N'//,i#,„(/V),/?'//h#„(N) 

(3)    ^r,(7V,.,.),/H#cn(^)[i'(ci)^N],/,i#„(iV)[./(c,)^F'],F'[«(ci)->F]. 

ra-l            Ai  =  {ri{sko,ski,sk2)},N/sko 

(4)      ^/rl#c„(5/jo)Kci)-^sfco],/rl#rv(sfco)[5/(Ci)->F'],F'[j.(ci)^F]. 

1 

(5)    ^  /ri#rv(sA;o)[s/{ci)^F'],F'[r>(ci)^F]. 

1                 F'/sf{Cufrl#r.(sko)) 

(6)      ^s/{c,,Al#rv(sfco))Kci)^F]. 

1            F/1 

(7)    ^  frii^r.{sko)[cr{ci)^Y'],Y'  %  -JPY'. 

1                 >"7cr(ci,/ri#rv(sfco)) 

(8)      ^Cr(Ci,/rl#rv(.5/Co))   ^    'JPY'- 

1 

(9)    f-cr(ci,/,i#,v(sfco))b(ci)^V'l,F^  'JPY' 

1 

1 

(10)  ^  /ri#rv(sA;o)[cp^/v^l,r;(yv(,.,y"),N^  =  yv{,y'[z>(ci)^y'i,y  /  -jpy. 

1                 K/frl#cn{sko) 

ill)  ^r\{N[,.,Y'),f,,^,„(sko)  =  N[,Y'[v{c:)^Y],Y  ^  'JPY'. 

1                 N{//H#c.,(A'.)T7/rl#cr(N,) 

(12)^ri(yVi,.,.),/,i#c„(5A;o)=/ri#cn(Ni),/,i#cr(iVi)[i;(ci)^F],V'7^  >JPY'. 

»!             A2  =  Ai  U{ri(sA;3,sA:4,sA:5)},jVi/sfc3 

(13)   ^  /rl#cn(sA:o)  i  /rl#cn(sfc3),/H#cr(s/C3)[«(Cl)-^V'l,r  #    >  JPY' . 
1 

(14)   ^  /rl#c„(sfco)[t'(Cl)^5ol,/ri#cn(sfc3)[r(c,)^Si],So  =  Sj , /ri#cr(sA;3)[t;(ci  )-^y],  F  / 

'JPY'. 

1            So/sko,Si/sk3 

(15)   ^  Sfco  =  sfc3,/rl#cr(s/C3)b(ci)->V'],y'  #    '  JPY  '  . 

"S-l                 A3  =  A2U{s/co  =  S/C3} 

(16)   ^  /rl#cr(sfco)[TKci)^V],y  ^    'JPY'. 

1                 Y/sk2 

(17)  <-  sk2  /  'JPY'. 

■*|            A4  =  A3U{sfc2  /'JPY'}  =  {ri{sko,skufik2),sk2  ?^'JPY'} 

(18)  D 

Figure  7:  One  possible  refutation  for  query  CQ3.  Method  and  functor  names  are  abbreviated 
where  possible  (e.g.,  cr  =  currency).  The  resolution  step  labeled  »s-  is  where  a  literal  is  abducted. 
The  abductive  answer  corresponding  to  this  refutation  is  given  by  A4,  and  the  intensional  answer 

by  (A4,{7V/sfco,F/l}). 
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SELECT  rl.cname,    1   FROM  rl   WHERE  rl. currency  <>    'JPY'; 

On  backtracking,  the  other  solution  corresponding  to  F  =1  000  will  be  obtained.  The  complete 
answer  returned  to  the  user  is  thus  given  by: 

MQ3:     SELECT  rl.cname,    1  FROM  rl  WHERE  rl. currency  <>    'JPY' 
UNION 
SELECT  rl.cname,    1000  FROM  rl   WHERE  rl. currency  =    'JPY'; 

The  correspondences  between  integrity  checking  and  semantic  query  optimization  can  be 
clearly  seen  in  the  above  example.  At  step  (15),  the  functional  dependencies  r^  allows  the  initial 
constraint  (sko  =  ^fca)  to  be  propagated  and  eventually  allow  ri(sA;3,  sA;4,5/j5)  to  be  eliminated 
from  the  abductive  answer.  If  it  were  not  so,  the  intensional  answer  obtained  would  instead  be: 

SELECT  rell.cname,    1   FROM  rl   rell,   rl   rel2 
WHERE  rel 1 . cname  =  rel2 . cname ; 

which  would  include  a  redundant  second  reference  to  r\.  This  second  answer  is  unintuitive,  and 
obviously  would  lead  to  suboptimal  performance  if  executed  without  further  optimization.  In 
the  more  general  scenario,  constraints  can  be  useful  in  pruning  an  entire  refutation  altogether. 
For  instance,  if  Q3  had  been: 

Q3':     SELECT  rl.cname  IN  cl,   rl .revenue. scaleFactor  IN  cl 
FROM  rl   WHERE  rl. currency  =    'JPY'; 

we  will  eventually  end  up  trying  to  abduct  sk-^  =' JPY'  where  sk2  7^  'JPY'  is  already  present 
in  A,  thus  resulting  in  an  unsuccessful  refutation.  In  this  case,  the  mediated  query  MQ3'  will 
consist  of  only  the  second  select-statement  in  MQ3. 

6     A  Meta-Logical  Extension  to  the  COIN  Framework 

In  Section  4.3,  context  knowledge  in  a  COIN  framework  is  represented  by  a  set  of  separate 
theories  (i.e.,  C  =  {c\  :=  Ci,...,c„  :—  Cn})-  We  describe  here  an  extension  to  this  basic 
framework  which  allows  new  contexts  to  be  defined  in  terms  of  existing  ones  in  an  incremental 
fashion.  Two  basic  mechanisms  underly  this  move  to  such  an  extension:  the  treatment  of 
context  as  a  set  of  parameterized  statements  and  the  introduction  of  the  hierarchical  operator  -<, 
which  defines  a  subcontext  relation  on  the  set  {ci , . . . ,  c^i }. 

Recall  that  the  relative  truth  or  falsity  of  a  statement  can  be  represented  using  McCarthy's 
ist,  so  that 
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ist(ct,a) 

is  taken  to  mean  that  the  statement  a  is  true  in  context  Cj.  The  relation  -<  allows  us  to 
make  incremental  refinements  to  statements  which  describe  what  is  already  known  about  an 
enclosing  context.  Thus,  if  c,  is  a  subcontext  of  Cj,  denoted  by  Cj  ^  Cj,  this  allows  us  to  introduce 
a  differential  context  denoted  by  (Jc^,  such  that: 

ist(Ct,a)  <-  cr  G  (Jci 

ist(ci,a)  <—  c,  ^  Cj,  ist{cj,a),  not-overridden{6c,,cr)- 

The  predicate  not-overridden  indicates  that  the  statement  a  obtained  from  the  more  general 
context  Cj  is  not  explicitly  overridden  by  the  differential  context.  The  composition  of  a  new 
context  theory  of  c,  from  Cj  and  6^  is  similar  to  that  accomplished  by  the  isa  operator  defined 
in  [4]. 

In  the  COIN  data  model,  statements  in  a  context  are  "decontextualized"  by  making  explicit 
references  to  its  reification  in  the  form  of  a  context-object.  For  example,  the  statement 

ist(cj,t[mi-^t']  <-  t[m2^t']). 

can  be  equivalently  stated  as 

t[mi{cj)^t']  <-  t[m2(cj)^t']. 

This  second  form  simplifies  the  inferences  which  are  undertaken  to  support  context  mediation, 
but  requires  some  adjustment  to  allow  statements  to  be  inherited.  Specifically,  if  the  above 
statement  is  inherited  by  context  c,  (^  Cj),  we  will  need  to  replace  the  references  to  Cj  with  c,. 
This  is  accomplished  by  requiring  all  statements  in  Sc^  to  be  parameterized;  i.e., 

6c,iX)  =  {a^(X),...,ai[X)} 

For  instance,  the  earlier  statement  would  have  been  asserted  as 

aiX)  =  t[in^{X)^t']  ^  t[m2{X)^t']. 

in  the  set  6c  ■  The  statement  a{X)  is  said  to  be  uninstantiated.  The  collection  of  uninstantiated 
axioms  forms  an  uninstantiated  context  set. 

Definition  9  Let  6C  =  {cq  :=  Co{X)JcA^)^- ■  ■  ^^c^i^)}^  for  which  (^^IJ^-)  (i  =  l,...,7i)  is 
said  to  be  the  differential  for  context  Ci  with  respect  to  -<,  which  defines  a  partial  order  on  the 
contexts  {ci, . . . ,  c„}.  Let  {c,, , . . . ,  Ct^ }  be  the  predecessors  of  q  with  respect  to  the  subcontext 
relation  -<.  Then  the  uninstantiated  context  set  for  Cj^,  denoted  by  Cij(X),  can  be  obtained 
from  Ci{X)  as  before:  i.e., 
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•  a{X)GQ^(X)^a(X)E6,,^iX) 

•  a(X)  G  QjiX)  <-  a(X)  e  Q{X),  not-overridden{6c  {X),a{X)). 

The  context  co  is  said  to  be  the  default  context  and  forms  the  basis  for  the  other  differentials. 

Q 

Definition  10  Given  6C  =  {cq  '■—  Co{X),6c^{X), . . .  ,^c„(^)}  and  the  subcontext  relation  -<. 
Suppose  Ct(X)  is  the  uninstantiated  context  set  for  c,  obtained  inductively  using  Definition  9. 
The  context  set  for  c,  is  given  by  the  set  Ci(ci).  Q 

Notice  that  we  have  not  described  how  one  is  to  determine  whether  or  not  a  given  statement 
is  being  overridden  in  a  specific  context.  The  simplest  approach  is  to  assume  that  whenever  a 
method  atom  appears  in  the  head  in  a  differential  context  set,  none  of  the  other  rules  (pertaining 
to  this  method)  defined  in  any  of  its  supercontext  applies.  For  example,  if  the  scaleFactor  for 
the  type  companyFinancials  is  given  in  two  distinct  context  differentials  along  a  given  path  in 
the  hierarchy,  then  the  statement  in  the  more  specific  context  is  said  to  take  precedence  and 
will  be  used  in  the  corresponding  context. 

The  above  scheme  leads  to  the  following  extended  formulation  of  a  COIN  framework. 

Definition  11  The  extended  coin  framework  is  a  sextuple  given  by  <<S,  /x,  £",  P,  6C,  ^  >,  where 
<S,£,  and  V  are  defined  as  before  in  Definition  6,  6C  is  as  defined  in  Definition  9,  and  -<  is  the 
subcontext  relation  defined  on  the  set  of  contexts  {ci, . . . ,  c„}  induced  by  6C.  □ 

7     The  Context  Interchange  Prototype 

The  feasibility  and  features  of  the  proposed  strategy  have  been  demonstrated  in  a  prototype 
which  provides  mediated  access  to  both  traditional  structured  databases  and  semi-structured 
data  sources  (web-sites).  Our  implementation  leverages  on  the  world- wide- web  (WWW)  in  a 
number  of  ways:  for  providing  physical  connectivity  across  different  networks  and  platform, 
in  adopting  a  universal  addressing  scheme  to  different  types  of  geographically-distributed  re- 
sources, and  for  providing  us  with  a  wealth  of  heterogeneous  data  sources.  As  shown  in  Figure  8, 
queries  submitted  to  the  system  are  intercepted  by  a  Context  Mediator,  which  rewrites  the  user 
query  to  a  mediated  query.  The  Optimizer  transforms  this  to  an  optimized  query  plan,  which 
takes  into  account  a  variety  of  cost  information.  The  optimized  query  plan  is  executed  by  an 
Executioner  which  dispatches  subqueries  to  individual  systems,  collates  the  results,  undertakes 
conversions  which  may  be  necessary  when  data  are  exchanged  between  two  systems,  and  returns 
the  answers  to  the  receiver. 
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CONTEXT  MEDIATION  SERVICES 


LxjcaJ  DBMS 
supporting  intermediate 
processing 


subquery  //    subquery 
answers 


Local  DBMS 


Non-traditional 
Data  Sources 
(e.g.,  web-pages) 


Figure  8:  Architectural  overview  of  the  Context  Interchange  Prototype 

The  Context  Mediator  is  implemented  in  ECLiPSe^^,  which  is  an  efficient  and  robust  Prolog 
implementation  distributed  by  the  ECRC.  At  the  heart  of  the  Context  Mediator  is  a  meta- 
interpreter  which  implements  the  extended  SLD+Abduction  algorithm  described  in  Section  5.1. 
Since  computation  of  the  abductive  answer  is  performed  within  a  Horn-clause  (HC)  framework, 
we  need  to  translate  both  the  user-query  as  well  as  COIN  clauses  to  statements  in  Datalog"''^^  and 
on  obtaining  the  answer,  perform  the  reverse  translation  to  SQL.  In  the  absence  of  aggregation 
operators,  the  SQL-to-HC  and  HC-to-SQL  compilers  are  relatively  straight-forward  since  both 
of  these  languages  shares  a  common  grounding  in  predicate  calculus. 

8     Conclusion 

We  have  presented  a  tightly-woven  tapestry  of  ideas  derived  from  different  threads  in  the  lit- 
erature in  artificial  intelligence  (on  "contexts"),  databases  (on  "heterogeneous  databases"  and 
"semantic  query  optimization"),  logic  programming  (on  "abductive  logic  programming"  and 


^^ECLiPSe:      The    ECRC    Constraint    Logic    Parallel    System, 
http : //www . ecrc . de/eclipse/. 


More   information    can    be   obtained    at 
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"meta- logic"),  and  others  which  are  already  present  at  the  confluence  of  different  scholarly 
traditions  (e.g.,  "deductive  object-oriented  data  models"  and  "intensional  answers").  The  vari- 
ous results  and  insights  integrated  together  in  a  formal  framework  for  the  Context  Interchange 
strategy,  and  provide  a  well-founded  basis  for  representing  and  reasoning  about  data  seman- 
tics in  disparate  sources  and  receivers.  Specifically,  we  have  described  how  data  semantics  in 
disparate  systems  can  be  articulated  using  a  "object-logic" ,  and  how  logical  inferences  (in  par- 
ticular, abduction)  can  be  used  to  provide  mediated  access  to  both  data  and  data- semantics. 
At  the  same  time,  we  showed  that  the  COIN  framework  presents  a  viable  alternative  to  classical 
and  contemporary  integration  approaches  by  by  allowing  different  kinds  of  information  to  be 
more  easily  accessed,  by  making  possible  the  sustenance  of  an  infrastructure  that  mitigates  the 
complexity  in  the  creation  and  maintenance  of  large-scale  systems,  and  by  isolating  changes  in 
different  components  which  are  only  loosely-coupled  together. 

This  paper  is  by  no  means  the  last  word  on  Context  Interchange.  On  the  contrary,  there 
are  many  interesting  issues  which  we  are  only  beginning  to  explore.  We  mention  below  two  of 
these  undertakings. 

As  noted  in  [35],  the  autonomy  and  heterogeneity  of  sources  present  new  challenges  for 
query  processing  and  optimization  which  are  not  the  same  as  those  in  distributed  database 
systems.  These  differences  stem  from  constraints  which  are  characteristic  of  the  underlying  en- 
vironment; for  example,  difl'erent  sources  may  differ  in  their  query-handling  ability,  cost  models 
may  not  be  known,  and  data  conversions  may  incur  large  hidden  costs  which  are  not  accounted 
for  previously.  As  we  have  shown  earlier,  the  detection  of  unsatisfiable  answers  in  the  abductive 
framework  constitute  a  form  of  semantic  query  optimization  which  presents  huge  payoffs.  We 
recently  embarked  on  a  re-implementation  of  the  abduction  procedure  using  Constraint  Han- 
dling Rules  [19],  and  have  found  much  synergy  between  the  abduction  framework,  semantic 
query  optimization,  and  constraint  logic  programming  [25]  on  the  premise  of  similar  observa- 
tions which  motivated  the  work  recently  presented  in  [53].  To  this  end,  we  have  been  able  to 
make  use  of  the  existing  prototype  as  a  testbed  on  which  theoretical  insights  can  be  rapidly 
implemented  and  experimented  with. 

The  richness  of  the  representational  formalism  is  a  two-edged  sword  since  it  presents  also 
greater  scope  for  abuse.  While  it  is  unlikely  that  there  will  ever  be  a  "definitive  guide"  to  context 
modeling,  case  studies,  evaluation  criteria,  prescriptive  guidelines,  and  tools  are  in  dire  need. 
At  this  moment,  we  are  working  with  several  industry  information-providers  in  applying  this 
mediation  technology  to  the  "real  world"  problems  encountered  by  them.  We  are  hopeful  that 
these  experiences  will  be  instrumental  in  developing  and  validating  integration  methodologies 
that  are  grounded  in  practice. 
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